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Abstract 



We examine the issue of separation and code design for networks that operate over 
finite fields. We demonstrate that source-channel (or source- network) separation holds 
for several canonical network examples like the noisy multiple access channel and the 
erasure degraded broadcast channel, when the whole network operates over a common 
finite field. This robustness of separation is predicated on the fact that noise and inputs 
' are independent, and we examine the failure of separation when noise is dependent on 

CN ■ inputs in multiple access channels. 

Our approach is based on the sufficiency of linear codes. Using a simple and unifying 
framework, we not only re-establish with economy the optimality of linear codes for 
single-transmitter, single-receiver channels and for Slepian-Wolf source coding, but also 
establish the optimality of linear codes for multiple access and for erasure degraded 
broadcast channels. The linearity allows us to obtain simple optimal code constructions 
and to study capacity regions of the noisy multiple access and the degraded broadcast 
channel. The linearity of both source and network coding blurs the delineation between 
source and network codes. While our results point to the fact that separation of 



source coding and channel coding is optimal in some canonical networks, we show 
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Figure 1: A linear network for which source-channel separation fails pQ. 

that decomposing networks into canonical subnetworks may not be effective. Thus, we 
argue that it may be the lack of decomposability of a network into canonical network 
modules, rather than the lack of separation between source and channel coding, that 
presents major challenges for coding over networks. 

1 Introduction 

The failure of source-channel separation in networks is often considered to be an impedi- 
ment in applying information theoretic tools in network settings. The simple multiple access 
channel of Figure Ogives one example of how separation can fail pQ. The receiver's channel 
output is the integer sum of the binary channel inputs of m > 2 users, yielding a channel 
output alphabet of size m + 1. Since independent, uniformly distributed input signals fail 
to achieve the maximum mutual information between the transmitted and received signals, 
direct transmission of dependent source bits over the channel without channel coding some- 
times yields higher achievable transmission rates than Slepian-Wolf source coding followed 
by multiple access channel coding. 

While this simple example may at first appear to establish irrefutably the failure of source- 
channel separation in networks, its simplicity is misleading. In particular, note that the 
alphabet size of the output is dependent on the number of transmitters. Thus, the network 
lacks a consistent digital framework. Replacing integer addition with binary addition to give 
a channel with input and output alphabets of the same cardinality yields a communication 
system for which separation holds. 

In this paper, we argue that source-channel separation is more robust than counterex- 
amples may suggest. We assert, however, that separate source and channel code design 
does not necessarily simplify the design of communication systems for digital networks. The 
operations of compression and channel coding are conceptual tools rather than necessary 
components. While modularity, such as that afforded by the separation theorem, is desir- 
able in the design of components, the decomposition of a problem into modular tasks may 
increase complexity when the decomposition imposes unnecessary constraints. 

In addition to examining traditional questions of source-channel separation, we also in- 
vestigate a variety of other separation assumptions implicit in common network design tech- 
niques. By assuming independent data bits and lossless links, layered approaches to network 
design endorse a philosophy where source and channel coding are separated from network 
coding or routing. Through examples, we demonstrate the fragility of this assumed separa- 
tion. Even in simple digital networks, neither separate source-network coding strategies nor 
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separate channel-network coding techniques guarantee optimal communication performance. 

Our network model requires a common finite field alphabet at all nodes but allows noise 
in the form of erasures or additive noise. We treat two important types of canonical networks: 
noisy multiple access networks, such as may arise in wireless transmissions, and degraded 
erasure broadcast networks, such as may arise in wireline broadcasting with packet losses 
due to congestion. These networks are not only some of the fundamental building blocks 
of network information theory, but also generally demonstrate the breakdown of separation 
between source coding and coding over the channel or, rather, network. We establish a 
simple and economical framework and use that framework to first re-derive classical results 
for source compression and coding for the single transmitter, single receiver channel. We then 
use the same framework to prove that linear codes are sufficient and asymptotically optimal 
for the noisy multiple access and erasure degraded broadcast networks. We also show that 
optimal code construction for these networks is particularly simple. Our approach may be 
viewed, in the simplest way, as a generalization of information theoretic results known for 
single-receiver source codes and for single-transmitter, single-receiver channel codes. From 
the networking perspective, our results bear the interpretation that separate optimization of 
compression, channel coding, and routing fails to achieve the optimal performance. 

For multiple access networks, we show that source-channel separation is optimal for input- 
independent noise which may be additive or in the form of erasures. Using this property, 
we prove the optimality of linear codes for multiple access networks. However, separation 
may fail to achieve the optimal performance when additive noise is input-dependent. For the 
additive noise channel over the binary field, we compute the maximum difference between the 
sum channel capacities when channel encoding is done with complete collaboration between 
the channel encoders and with no collaboration between the channel encoders . We also 
obtain an expression for the probability that the two sum capacities differ for a binary 
additive noise multiple access channel picked randomly from the ensemble of all channels of 
this class. We present an optimal multiple access systematic channel code construction and 
provide coding techniques when transmissions are bursty. 

Though source-channel separation may not hold in general for broadcast channels, we 
show that it does hold for the erasure broadcast channel. Thus, for erasure broadcast chan- 
nels, we show that linear codes are also optimal. 

Finally, while the multiple access and broadcast networks considered here are important 
in their own right, we show that we cannot concatenate them arbitrarily and maintain end- 
to-end functionality. In effect, there may not be separation of large networks into canonical 
elements. We argue that this lack of separation may pose a real challenge in communication- 
network system design. 

In section 2, we consider source-channel separation for multiple access networks. We dis- 
cuss background, establish preliminaries and re-derive classical results for single-transmitter, 
single-receiver networks in section 3. In sections 4 and 5, we consider multiple access net- 
works and broadcast networks, respectively. We address the issue of decomposability of a 
network into canonical elements and conclude in section 6. 
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Figure 2: Various single-transmitter, single-receiver schemes. 

2 Source-channel separation for multiple access net- 
works 

It is well known that source- channel separation holds for single-transmitter, single-receiver 
networks. Thus, the source and channel coding operations can be separated without loss 
in optimality. The joint and separate source-channel coding schemes for single-transmitter, 
single-receiver networks are shown in Figure 121 

We now address the topic of separation for source-multiple access channel pairs. Con- 
sider binary source pairs (Z/i, C/2) and two transmitters transmitting binary symbols to a 
single receiver whose received alphabet is also binary. We denote the channel inputs, their 
associated rates and output symbol as (Xi,X 2 ), (Ri, R2) and Y, respectively. 
Let us summarize some known results on multiple access capacity regions. There are three 
categories of multiple access: 

• The most general multiple access is when the channel encoding is done with full col- 
laboration between the channel encoders. Optimal source coding can be performed 
with or without [22| collaboration between the source encoders. Moreover, there is 
no loss in optimality in separating the source and channel encoding operations since 
full collaboration exists between the two channel encoders. We will call this multiple 
access scheme as "Collaborative Multiple Access" (CMA). The capacity region for this 
type of multiple access is derived by Liao [Zj in his PhD thesis. 

• The second type of multiple access is when the source and channel coding at each 
transmitter is combined into a single operation and there is no collaboration between 
the joint source- channel encoders at the two transmitters. The encoders directly map 
the source pairs to channel inputs. We will refer to this multiple access scheme as 
"Non-collaborative Joint Multiple Access" (NJMA) and the capacity region for this 
scheme is derived by Cover and El Gamal |2j. 

• The least general, but most often considered, is multiple access where source and chan- 
nel coding are performed separately at each transmitter and there is no collaboration 
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Figure 3: Various multiple access schemes. 



between the encoders at the transmitters. We will refer to this multiple access scheme 
as the "Non-collaborative Separate Multiple Access" (NSMA) and the capacity region 
for this multiple access scheme is derived by Cover and Wyner IB] . 

The three multiple access capacity schemes are shown in Figure El Note that the figure 
for NSMA is same as Figure 1 in 

2.1 CMA capacity region 

For this type of multiple access, the channel encoders cooperate with each other and can gen- 
erate any joint input probability distribution, Px 1 x 2 ( x i, x 2)- For any Px 1 x 2 ( x i, x 2), denote 
the closure of the convex hull of all rate pairs (Ri, R2) satisfying 

R x < I(X i; Y\X 2 ), 
R 2 <I(X 2 ;Y\X 1 ), 
R 1 +R 2 <I(X 1 ,X 2 ;Y), 



MIT LIDS Technical Report 2687, March 2006 



6 



as ~L[P Xl x 2 (xi,x 2 )]. The CMA capacity region, Mcma, is the convex hull of the sets h[P Xl x 2 (xi,x 2 )] 
over all joint input probability distributions, P Xl x 2 ( x i, x 2)- Denoting the convex hull oper- 
ation over sets as CH(.), we have 



As the encoders cooperate in this multiple access scheme, we also refer to the CMA capacity 
as the "cooperative capacity" . 

2.2 NSMA capacity region 

In this multiple access scheme, the channel encoders cannot cooperate and have independent 
inputs which come, for example, from Slepian-Wolf source encoders. Let Px 1 (xi) and Px 2 (x 2 ) 
be the distributions on the two independent channel inputs. For any product distribution, 
Px\ (xi)Px 2 {x 2 ), denote the closure of the convex hull of all rate pairs (Ri,R 2 ) satisfying 



as C[Px 1 (xi)P X2 (x 2 )]. The NSMA capacity region, Mnsma, is the convex hull of the sets 
C[Px 1 (x 1 )Px 2 (x 2 )] over all product input probability distributions, Px 1 {.Xi)Px 2 (x 2 ). Hence, 
we have 



Owing to lack of cooperation, the channel encoders cannot increase the correlation between 
the inputs which results in the channel inputs being always independent. This makes the 
NSMA capacity region an improper 1 subset of the CMA capacity region since all joint input 
distributions cannot be generated. As the encoders in this multiple access scheme do not 
cooperate, we also refer to the NSMA capacity as the "separate capacity" . 

2.3 NJMA capacity region 

In this multiple access scheme, there is a single joint source-channel encoder at each trans- 
mitter that maps source symbols to channel inputs. The encoders at the two transmitters do 
not cooperate. This encoder is more general than the combination of the NSMA source and 
channel encoders, since it can make use of the dependence between the sources to increase the 
channel mutual information. As the set of channel input distributions that can be generated 
is larger than that of the NSMA scheme, the NSMA capacity region is an improper subset 
of the NJMA capacity region. Also, the NJMA capacity region is an improper subset of the 
CMA capacity region, since the channel encoders cannot generate all joint input probability 

1 In this paper an improper subset (superset) of a set A is denned as a set which is smaller (greater) or 
equal to A. 




(1) 



R 1 <I(X 1 ;Y\X 2 ), 
R 2 <I(X 2 ;Y\X 1 ), 
R 1 + R 2 <I(X 1 ,X 2 ;Y), 
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distributions, owing to lack of coordination. Only those joint input probability distributions 
that do not require the correlation between channel inputs to be more than the correlation 
between the source pairs can be generated. Therefore, the set of joint input distributions 
that can be generated depends on the source that is being transmitted. We denote the set 
of joint input distributions that can be generated as ^x 1 x 2 - 

For any Px^faii Xr i) £ ^x 1 x 2 ' denote the closure of the convex hull of all rate pairs (Ri, R2) 
satisfying 



as $>[Px 1 x 2 (. x i, x 2)}- The NJMA capacity region, Mnjmaj is the convex hull of the sets 
l[PxiX 2 { x ii ^2)] over all joint input probability distributions in F XiX2 . Hence, we have 



In this paper, we will also refer to this capacity region as the "Joint source-channel capacity 
region" . 

2.4 A sufficient criterion for separation to hold 

Consider the transmission of a binary source pair over a multiple access channel. Even if 
we allow cooperation, there is no hope of transmitting the source pair over the channel with 
an arbitrarily small error probability unless the Slepian-Wolf source coding region and the 
CM A capacity region have a non-zero intersection. The question of source-channel separation 
therefore applies to those source-channel pairs for which these two regions overlap. Hence, 
for any channel, while considering source-channel separation, we always restrict our attention 
to source pairs whose Slepian-Wolf region overlaps the CMA capacity region of that channel. 
If source and channel coding are done separately without coordination between the two 
transmitters, then only those source pairs for which the Slepian-Wolf region overlaps with 
the NSMA capacity region can be reliably transmitted over the multiple access channel. 
In NJMA, the encoders can make use of the correlation between the sources and hence the 
NJMA capacity region is an improper superset of the NSMA capacity region. However, since 
the encoders at the transmitters do not coordinate, this region is an improper subset of the 
CMA capacity region, in general. 

Figure 0] shows the CMA and NSMA capacity regions for a multiple access channel. For 
ease of illustration, we have considered a multiple-access channel whose capacity regions 
are pentagons. The regions, in general, may not be pentagons. The Slepian-Wolf regions 
for three different source pairs are also shown. ABCDO is the NSMA capacity region and 
PQRSO the CMA capacity region. Source-channel separation holds for all source pairs 
whose Slepian-Wolf region overlaps ABCDO. For source pairs whose Slepian-Wolf region 
overlaps only PQRSDCBA, separation may fail. Separation fails for those source pairs 
for which the NJMA joint source-channel encoder can increase the capacity region beyond 



R 1 <I(X 1 ;Y\X 2 ), 
R 2 <I(X 2 ;Y\X 1 ), 
R 1 + R 2 <I(X 1 ,X 2 ;Y), 




(3) 
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the NSMA capacity region so that it intersects the source pair's Slepian-Wolf source coding 
region, which allows the source pair to be reliably communicated. The joint source-channel 
encoders make use of the source statistics to increase correlation between the channel inputs, 
which may increase the capacity region. More simply, the joint source-channel encoders try to 
match the source statistics to those needed by the channel to maximize mutual information. 
However, when the joint source-channel encoders cannot increase the capacity region enough 
to overlap the source pair's Slepian-Wolf region, reliable communication is impossible. In this 
case we say that separation holds since joint and separate source-channel codes fail equally. 
Note that the NJMA capacity region cannot increase beyond the CMA capacity region. 
Separation fails for the example in j2], since the source pair statistics are perfectly matched 
to what is required to maximize the channel mutual information. Source pairs whose Slepian- 
Wolf regions lie outside PQRSO cannot be reliably transmitted over the channel with or 
without coordination between the transmitters. 

We now derive a sufficient criterion for separation to hold. Since the NJMA capacity region 
is an improper subset of the CMA capacity region, a sufficient condition for separation to 
hold for any source-channel pair is that the NSMA capacity region for the channel is the 
same as its CMA capacity region. For these channels, increasing correlation between the 
channel inputs does not increase mutual information. Note that for these channels, the 
region PQRSDCBA is a null set. We obtain a lemma that states this sufficient criterion 
for separation to hold for any source-channel pair: 

Lemma 1 Separation holds for a multiple access source-channel pair if for the channel the 
following is satisfied 

Knsma = I^cma- 

If the sources are independent, joint source-channel coding is equivalent to separate source- 
channel coding. This yields the following lemma: 

Lemma 2 Separation holds for a multiple access source- channel pair if the sources are in- 
dependent. 

3 Background, Preliminaries, Single Transmitter-Single 
Receiver Networks 

3.1 Background 

The use of random linear transformations in coding appears early in the information-theoretic 
literature. For channel coding, Elias [3] shows that random linear parity check codes, formed 
by Bernoulli (1/2) choices for the parity check entries in a systematic code's generator matrix, 
achieve capacity for the binary erasure channel and the binary symmetric channel. Elias 
also gives a construction for sliding parity check codes requiring fewer random binary digits. 
MacKay jl| proves that two families of error- correcting codes based on very sparse random 
parity check matrices - Gallager codes and MacKay- Neal codes (a special case of the former) 
- when optimally decoded, achieve information rates up to the Shannon limit for channels 
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with symmetric stationary ergodic noise. MacKay also demonstrates empirically, for binary 
symmetric channels and Gaussian channels, that good decoding performance for these codes 
can be achieved with a practical sum-product decoding algorithm. 

Linear channel coding for network systems has received far less attention. In this work, 
we consider both multiple access and degraded broadcast channels. Multiple access networks 
comprise a collection of transmitters sending information to a single receiver. In our network 
model, the received signal is the sum of the transmitted signals with the possible inclusion of 
either erasures or additive noise. While this type of additive interference channel has received 
considerable attention in the literature (see, for example, [U] 171 l§] ITO"] ITT| 112] ITS] 1X3] ITo] IT7j) 
the majority of the work to date considers only the case where the incoming data streams 
interfere additively in the real field. Notable exceptions are the works of Kautz in jT3J and 
Poltyrev and Snyders in jT^]. Kautz introduces superimposed codes in jT3], for which the 
received symbol is the boolean sum ("OR") of the channel inputs and in [TJJ], Poltyrev and 
Snyders treat a modulo-2 multiple access channel without noise. Both works consider the 
case where a proper subset of the transmitters transmit to the decoder at any given instant. 
We are unaware of prior work on linear coding for multiple access channels. 

In broadcast networks, we consider physically and stochastically degraded channels with 
both additive noise and erasures. While the degraded broadcast channel is well under- 
stood, ^Sl EI] , we are likewise unaware of any prior work on linear broadcast channel codes. 

On the source coding side, Ancheta f2D] presents universally optimal linear codes for 
lossless coding of binary sources in point-to-point networks; he also shows that the rate 
distortion function of a binary, stationary, memoryless source cannot be achieved by any 
linear transformation over a binary field into a sequence with rate lower than the entropy 
of the source. The syndrome-source-coding scheme described by Ancheta uses a linear error 
correcting code for data compression, treating the source sequence as an error pattern whose 
syndrome forms the compressed data. 

In [21] , Csiszar generalizes linear source coding techniques to allow linear multiple access 
source codes that achieve the optimal performance derived by Slepian and Wolf [22J. Csiszar 
demonstrates the universality of his proposed linear codes 2 and bounds the corresponding 
error exponents. The linear coding results are generalized for more than two sources and 
receivers in These results are generalizable to single or multiple Markov sources. 

Addressing the problem of practical encoding and decoding for multiple access source 
codes, [231 12H 1213 12H1 12JJ introduce the Distributed Source Coding Using Syndromes (DIS- 
CUS) framework, initially looking at sources with strongly structured statistical dependen- 
cies. Schonberg et al. [2H] note that Csiszar's proof can be used to show that application 
of LDPC codes in the DISCUS framework approaches the Slepian- Wolf bound for general 
binary sources; they then demonstrate through simulation that belief propagation decoding 
works well in practice, with a small performance gap due to the finite block length and choice 
of parity check matrix. In |40 a , it is shown that LDPC codes can achieve any point in the 
Slepian- Wolf region with optimal decoding. Uyematsu proposes a deterministic construc- 
tion for linear multiple access source codes in [21]; the resulting codes achieve any point 
in the achievable rate region, with two-step encoding and decoding procedures (similar to 

2 In the given fixed-rate coding regime, a universal code is any code that achieves asymptotically negligible 
error probability on all sources for which the code's rate falls within the source's achievable rate region. 
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concatenated codes for channel coding) of complexity polynomial in the block length. 

In other related work, multiple access source code design by randomly choosing among 
general block codes is considered as an exercise in [20] • Loeliger considers averaging 
for sets of linear codes with basic symmetry properties and gives a general version of the 
Varshamov-Gilbert bound and a random coding bound that depend only on the size of the 
set of error patterns; these results extend corresponding prior results for more specific types 
of error patterns. Among the applications mentioned are burst error correction, and multiple 
access systems where each user considers the set of possible interference patterns arising from 
the activity of other users as well as channel noise. 

Zhao and Effros introduce broadcast system source codes in [32J [33]. In a broadcast 
system source code, a single encoder describes multiple sources to be decoded at a collection 
of receivers. Sources may include both "common information" intended for more than one 
receiver and "specific information" intended for only one receiver. In the most general 
case, they allow a distinct source for every non-empty subset of the set of possible receivers. 
Design algorithms and performance bounds for lossless broadcast system source codes appear 



Network coding, introduced in [33], is a generalization of routing for transmitting bits 
through lossless networks. The sufficiency of linear network codes for multicast networks is 
shown in [3J3 EH] whereas Koetter and Medard give an algebraic framework in [3JJ . Ref- 
erence [2E| considers a randomized approach for independent or linearly correlated sources, 
while jH] and [13] give polynomial-time deterministic and randomized network code con- 
structions for independent sources. Chou et al. [32] demonstrate the practical use of random 
linear codes over the network topologies of commercial Internet Service Providers. 

3.2 Preliminaries 

Since the focus of our paper is on the relationships between system components and concepts, 
we give all results in their simplest forms. In particular, we state our results and their 
corresponding derivations for independent, identically distributed (i.i.d) random processes 
and focus on binary source and channel alphabets, modified only for the inclusion of the 
erasure noise model. For simplicity, all code constructions combine random linear encoding 
with typical set decoding. The definition of the typical set At for a single random sequence 
U\, U2, ■ ■ ■ drawn i.i.d according to probability mass function (pmf) p is 



Given source alphabet U, H{U) = — J2 u euP( u ) ^°SP( U ) is the entropy of the i.i.d random 
process Ui, U2, .... By the Asymptotic Equipartition Property (AEP), 



and Pr(U n G A e ) — > 1 as n — ► 00. In most cases, we use context to distinguish between 
typical sets. Thus U n G Aj 1 refers to the typical set for the pmf p(u) of random variable U 
while Z n G A ™ refers to the typical set for the pmf q(z) of random variable Z. Focusing on 
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linear encoding and typical set decoding allows us to include the corresponding proofs and 
illuminates the relationships between them. 

While we state and prove our results in their simplest form for readability, we note that 
all of the results given here generalize widely from the forms that we state explicitly Some 
of these generalizations are described below. 

• While we focus on the binary alphabet, results generalize to arbitrary finite fields 3 . 
The requirement that the finite field be the same for all sources, channel codewords, 
and additive noise processes cannot, however, be relaxed in general. The channel 
output alphabet is allowed to differ only in the inclusion of erasures. In our model, 
erasures propagate as erasures when the output of one channel is fed into the input of 
a subsequent channel. 

• We state results for i.i.d source and noise random processes; the results generalize to 
stationary, ergodic processes for which corresponding typical sets exist. 

• We use distribution-dependent typical set decoders; many of the results in this pa- 
per can be generalized to achieve universal coding performance and improved error 
exponents using the maximal entropy decoders of Csiszar 21. 

• We ignore decoder complexity issues; good (sub-optimal) decoders with lower com- 
plexity can be derived for many of the systems described here using sparse matrix 
techniques like the the low-density parity-check (LDPC) coding techniques developed 
by Gallager 11. MacKay [I], Urbanke et al. J5], and others. 

• We give results for the smallest generalizable instances of each network type (e.g., two- 
receiver broadcast channels and three- receiver broadcast system source codes); our 
results generalize to larger systems. 

3.3 Single-Transmitter, Single-Receiver Networks 

We begin by examining simple forms of some of the prior results described in Section ETT1 In 
particular, we give simple new proofs for the linear source and channel coding theorems for 
single-transmitter, single-receiver networks 01201 IS]- Our goal is to integrate, in a single 
simple framework, the earlier known results for linear source and channel coding. These 
new derivations demonstrate the relationships between these algorithms and random linear 
network coding techniques. We further provide a linear source coding converse. Finally, we 
extend the given random design arguments to design linear joint source-channel codes for 
the single-transmitter, single-receiver network. 

3.3.1 Linear Source Coding 

Given a single-transmitter, single-receiver network, source coding is equivalent to network 
coding of compressible source sequences. We say that a network code accomplishes optimal 

3 The results are not restricted to finite fields and hold even when the alphabet possesses a ring structure. 
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source coding on a noise-free network if that code can be used to transmit any source with 
entropy lower than the network capacity with asymptotically negligible error probability. 

Shannon's achievability result for lossless source coding demonstrates that for U\, U2, ■ ■ ■ 
drawn i.i.d from a Bernoulli(p) distribution and any e > 0, there exists a fixed-rate-(if (C/)+e) 
code for which the probability of decoding error can be made arbitrarily small as the coding 
dimension n grows without bound. The converse to Shannon's source coding theorem states 
that asymptotically negligible error probabilities cannot be achieved with rates lower than 
H(U). We begin by showing that the expected error probability of a randomly chosen, 
rate-i?, linear source code approaches zero as n grows without bound for any source U with 



Theorem 1 Let Ui, U2, ■ ■ ■ , U n be drawn i.i.d according to distribution p{u) on F 2 . For any 
rate, R > H{U), the error probability averaged over the ensemble of random linear source 
codes tends to as the codeword length tends to 00. 

Proof: The fixed-rate, linear encoder is independent of the source distribution. We use 
distribution-dependent typical set decoders for simplicity. We first describe the source en- 
coder and decoder for a fixed linear code. 

Let A n be an \nR~\ x n matrix with coefficients in the binary field IF2. The encoder for the 
linear source code based on A n is 



where u n = u* e (F 2 ) n is an arbitrary source sequence with blocklength n. The correspond- 
ing decoder is 



where v^ nR1[ = v* G (F 2 )^ n ^ and decoding to U n denotes a random decoder output (which 
yields a decoding error by assumption). The error probability for source code A n is 



We design a sequence {A n }^ ( L 1 of codes at random and show that if the rate is chosen 
appropriately, then the expected error probability E[P e (A n )] of the randomly chosen code 
decays to zero as n grows without bound. Using the above encoder and decoder definitions 
and letting w* e F£ be an arbitrary nonzero vector, 



if (17) < R. 



a n {u a ) = A n u, 




u n if u n e AT' and A n u = v and ^u" G A\ 
U n otherwise, 



fl {u} c s.t. A n u = v 



P e (An)=P*(Pn(a n (U n ))^U n ). 



E[Pi n) ] 



E[Pr(P n (a n (U n )) ± U n )\ 

p(u n )Pi(P n (a n (u n ))^u n )+ P(u n )Pr(P n (a n (u n ))^u n ) 




< 




(4) 



u n ,u n eA { E 



< 




(5) 



< 



,n(H(U)+e)r > ~\nR] 



(6) 



n 
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for some e n — > 0. Equation (0J) and the bound on the size of the typical set follow from 
the AEP. The symmetry represented by the introduction of w in (jSJ) and the bound on the 
corresponding probability in © result from the following argument. Let k be the number 
of ones in an arbitrary w^O. Then each coefficient of vector A n w is the sum of k indepen- 
dent Bernoulli (1/2) random variables. Since summing i.i.d Bernoulli (1/2) random variables 
yields a Bernoulli (1/2) random variable and the rows of A n are chosen independently, A n w 
is uniformly distributed over its 2^ nR ^ possible outcomes. 

By ©, E[P^ n) ] ^Oasn^oo provided that \nR\ > n(H(U) + e). □ 

We now present Lemma E3 which provides a form of converse to Theorem ^ While 
Theorem ^ shows that linear source codes are asymptotically optimal, Lemma El shows that 
any fixed non-trivial linear code yields statistically dependent output symbols. The result 
of this lemma highlights one difference between the fixed-rate, asymptotically lossless linear 
codes investigated here and the more typically applied variable-rate, truly lossless source 
coding schemes like Huffman and arithmetic codes. Variable-rate schemes can achieve lossless 
performance for any blocklength and precisely achieve the entropy for dyadic distributions. 

Lemma 3 Given any n > 1, let pi, . . . ,p n be non-uniform probability mass functions on 
the mutually independent random variables Ui,...,U n . Defining V = (Vi, . . . ,VkY and 
U = (U 1 ,...,U n )\ let 

V = aU 

for an arbitrary k x n matrix a. If V\, Vz, . . . , Vfe are mutually independent, then matrix a 
has at most one non-zero element in each column. 

Proof: See Appendix 1. □ 
An immediate consequence of this observation is the following corollary: 

Corollary 1 Linear source codes cannot achieve the entropy bound for non-uniform sources. 

This corollary follows from the fact that achieving the entropy bound necessarily yields an 
incompressible data sequence. We address the advantages of fixed-rate codes later in this 
section by showing how fixed-rate, linear source and channel codes combine naturally to give 
linear joint source-channel codes. 

3.3.2 Linear Channel Coding 

Just as source coding can be viewed as an extension of network coding to applications with 
statistically dependent input symbols, channel coding can be viewed as an extension of 
network coding to unreliable channels. Prior network coding results address the issue of 
robust communication over unreliable channels by considering strategies for working with 
non-ergodic link failures [H3 EB1- We here investigate ergodic failures. A network code 
designed for a single-transmitter, single-receiver network with ergodic failures is a channel 
code for the erasure channel. We say that a network code accomplishes optimal channel 
coding on the given channel if the network code can be used to transmit any source with 
rate lower than the noisy channel capacity with asymptotically negligible error probability. 
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Shannon's channel coding theorem shows that for any channel with capacity C, there 
exists a rate C — e code, where e > 0, such that the decoding error probability can be made 
arbitrarily small as we increase the codeword length. The converse states that asymptotically 
low error probabilities cannot be obtained for rates above the channel capacity. We now show 
that random linear channel codes achieve the capacity for the binary erasure and additive 
noise channels. 

Theorem 2 Consider an erasure channel with input and output alphabets equal to W 2 and 
{0, 1, E} ; respectively. The erasure sequence Z\, Z 2 , . . . is drawn i.i.d according to distribution 
q(z), where — 1 denotes the erasure event, andZi = designates a successful transmission. 
The channel noise is independent of the channel input by assumption. If the transmission 
rate is less than the channel capacity, i.e., R < 1 — q(l), then the error probability averaged 
over the ensemble of random linear channel codes tends to as the codeword length tends to 
oo. 

Proof: See Appendix 1. □ 
For the binary additive noise channel model, the noise may be viewed either as true noise, 
or as the signal of another user that has been combined with the desired signal at some node 
of a network. The second interpretation is only useful when the interfering signal is not i.i.d 
uniform; we treat interference channels in detail in Section |3J We begin with the channel 
coding theorem for the additive noise channel. 

Theorem 3 Consider an additive noise channel with input, output, and noise alphabets all 
equal to the binary field W 2 . Let noise Zi, Z 2 , . . . be drawn i.i.d according to distribution 
q{z). The channel noise is independent of the channel input. If the transmission rate R is 
less than the channel capacity, i.e., R < 1 — H(Z), then the error probability averaged over 
the ensemble of random linear channel codes tends to as the codeword length tends to oo. 

Proof: Let A n be an \n(l — R)~\ x n matrix with coefficients in W 2 . For channel coding, A n 
plays the traditional role of the parity check matrix. Following Csiszar [21], however, we 
interpret A n as a source code on the noise. For any matrix A n , we can design an n x \nR\ 
matrix B n such that B n has full rank and A n B n = 0. Matrix B n plays the role of the 
generator matrix for the desired channel code. We design B n to have full rank so that each 
length- \nR\ input message maps to a distinct channel codeword. We force A n B n = so 
that each codeword is in the null space of A n , making possible separation of the encoded 
message from the additive noise. 
More precisely, the channel encoder is defined by 




The channel output for a random channel input B n Y is 



Y 



B n V + Z. 



In decoding the channel output, the receiver first multiplies Y by A n to give 



A n Y = A n (B n V + Z) = A„Z. 
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The result of this multiplication is a source coded description of the error signal Z. Thus 
the decoding procedure involves applying source decoder (3 n to A n Y. The error is decoded 
correctly with high probability The receiver then subtracts the error estimate from the 
received Y to yield, with high-probability, £? n V. Since B n has full rank, the receiver can 
recover V perfectly from B n \. Thus the channel code's error probability equals the error 
probability for the corresponding source code on the error signal Z n . Given this insight, the 
channel coding theorem is an immediate extension of the source coding theorem. 
As in the proof of Theorem^ we choose a sequence {Ai}J£Li of matrices at random. This 
is our source code for the noise. For each A n , we design an n x (n — k) matrix B n such that 
B n has full rank and A n B n = Ok X (n-k)- By the argument given above, the error probability 
for the given channel code equals the error probability for the corresponding source code on 
the error signal Z n . By Theorem ^ the expected value of this error probability goes to zero 
as n grows without bound for all \n(l — R)~\ > nH(Z), giving an asymptotically negligible 
error probability for any R < 1 — H{Z). □ 



3.3.3 Linear Joint Source- Channel Coding 

From the coding theorem for single transmitter-single receiver channels, any binary source 
U with entropy H(U) can be transmitted over a channel with capacity C with arbitrarily 
low probability of decoding error as long as H{U) < C. Moreover, if H(U) > C, the 
probability of error is bounded away from zero, and it is not possible to send the source 
process reliably over the channel. We now show that random linear joint source-channel 
codes achieve capacity for the binary erasure and additive noise channels. 

Since source-channel separation holds for the single-transmitter, single-receiver network, 
concatenating optimal linear source and channel codes yields an optimal linear joint source- 
channel code. As an alternative to this approach, where we design separate random linear 
source and channel codes and concatenate them together, we can design a joint source- 
channel code at random and decode in a single typical set decoding argument. While we 
stick with the traditional name of joint source-channel coding, we note that the code does not 
perform the separate functions of source and channel coding jointly. Instead, the code maps 
source sequences to channel inputs in a manner that allows robust communication without 
any explicit or implicit compression or addition of channel coding redundancy. We present 
Theorems H] and which show that random linear joint source-channel codes are optimal 
for sending i.i.d Bernoulli sources over the binary erasure channel and binary additive noise 
channel, respectively. 

Theorem 4 Consider the random source U\,Ui,... drawn i.i.d according to distribution 
p(u), and let Zi, Z2, . . . be the channel's random erasures, where Z\, Z2, ■ ■ ■ are drawn i.i.d 
according to distribution q(z) and are independent of the source. (Again Zi = 1 denotes an 
erasure event.) Assume that the source and channel input alphabets are equal to the binary 
field IF 2 . If H(U) < l—q(l), then the error probability averaged over the ensemble of random 
linear codes tends to as the codeword length tends to 00. 



Proof: See Appendix 1. 



□ 
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Theorem 5 Consider the random source Ui,U2,-- - drawn i.i.d according to distribution 
p(u), and let Z\, Z2, . . . be the channel's random additive noise, where Z±, Z 2 , ■ ■ ■ are drawn 
i.i.d according to distribution q(z) and are independent of the source. Assume that the source, 
channel input, channel output, and noise alphabets are all equal to the binary field F 2 . If 
H(U) < 1 — H{Z), then the error probability averaged over the ensemble of random linear 
codes tends to as the codeword length tends to 00. 

Proof: See Appendix 1. □ 



4 Multiple Access Networks 

The techniques applied in the previous section for single-transmitter, single-receiver systems 
can also be applied to the design of linear source and channel codes for multiple access 
networks. For these networks, we show that source-channel separation holds when noise is 
independent of inputs and prove the optimality of linear joint source-channel codes. 

4.1 Linear Source Coding 

In [22!) Slepian and Wolf derive the rate region for multiple access source codes. Csiszar 
generalizes linear source coding techniques in [25 to show that linear multiple access source 
codes achieve all points in the rate region with arbitrarily small probability of error. We 
begin with a simple and short re-derivation of the linear multiple access source codes first 
studied by Csiszar. Our goal is to integrate the known results on source coding into our 
framework. 

Theorem 6 Consider source sequence (t/i^, C/ 2 ,i), (f/i,2> ^2,2)1 ■■ ■ drawn i.i.d according to 
distribution p[u\,U2) on (F 2 ) 2 . Then for any rates 



the error probability averaged over the ensemble of random linear multiple- access source codes 
tends to as the codeword lengths tend to 00. 

Proof: Given [ni?i] xn matrix Ai^ n and [Yi-ffa] x n matrix v4 2) „, we associate with (Ai jTl , ^4 2 , n ) 
a blocklength-n, two-transmitter, linear multiple access source code as follows. For any 
it" = u' G (F 2 ) n and u 2 = u 2 E (F 2 ) n , encoders 1 and 2 are defined by 



Rt > HiU^Uz) 
R 2 > HiUilUj) 
R! + R 2 > H(U 1 ,U 2 ) 



a 2 ,n(w 2 ) 
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For any v[ nR ^ = v* G (W 2 Y nR ^ and v[ nR2] = v| G (F 2 )^l, the decoder is defined by 



Pn(v[ 



nRi] JnRi] 

Un 



> "2 



if (wijW") e ^ and (^4i,nUi, A 2 , n u 2 ) = (vi,v 2 ) and 
£(ui,u 2 ) G^ n) n{( Ul ,u 2 )} c s.t. 
(Ai,„ui, A 2 , n u 2 ) = (vi, v 2 ) 
(£/", L^) otherwise. 



Again, decoding to ({/", t/^ 1 ) denotes an error event. 

An error occurs if either or both of the source sequences is decoded in error. Thus, following 
an argument very similar to those seen previously, 

E[P e {A hn ,A 2in )} 
= E[Pv((3 n (a 1>n (U?), a 2 , n (Um ? U») A (U?, U?) # A™)] 

+E\Pi(p n (a ljn {V?),a^(U2))) ? (U?,Ufi A (17? , U») G A^)} 
< e n + J2 P« u i) E l(« 2 ^« 2 )Pr(A 2 , n (u 2 -u 2 ) = 0) 

+ J2 PW,<) E l(tt?^tt?)Pr(i4 a ,„(ui-u 1 ) = 0) 



(«) 



P K,^)iK^<)i(ti^<) 



.(«) 



• Pr((Ai, n (ui - ui), A 2 , n (u 2 - u 2 )) = (0, 0)) 
< €n + 2»WW)+*) Pr(A 2 , n w = 0) + 2"WW>+ 2e > Pr(A lin w = 0) 
+2^^ Pr(Ai,„Wi = A A 2 , n w 2 = 0) 

= e _|_ 2-(r n «2l~n(H(C/ 2 |l/ 1 )+2e)) _j_ 2 -([n J R 1 l-n(H(C/ 1 |C/ 2 )+2€)) 
+2 -(Mil + M2l-n(H(C/ 1 ,C/ 2 )+e)) 

for arbitrary, non-zero w*,w*,w 2 G IF 2 and some e ra — > 0. Thus for all (|~ni?i~|, |~ni? 2 ~|) 
satisfying [ni^] > n{H{U x \U 2 ) + 2e), fni^] > n(#([/ 2 |£/i) + 2e), and fniZj] + [ni? 2 ] > 
n(H(Ui, U 2 ) + e), i?[P e (v4 ljn , A 2) „)] -^Oasn grows without bound. □ 
This theorem proves the optimality of linear multiple access source codes and re-establishes 
the results of Csiszar [2*T] . 



4.2 Linear Channel Coding 

Application of linear channel coding techniques to achieve linear multiple access channel 
codes is more straightforward than the corresponding source coding result. In particular, we 
consider the two additive multiple access channels shown in Figure El The first is the additive 
multiple access channel with erasures and the second is the additive multiple access channel 
with additive noise. The additive channel with interference only (no channel noise) can be 
viewed as a special case of either of the noisy models where errors or erasures occur with 
probability zero. Let X" and X£ denote the random channel inputs and use Y n to denote 
the corresponding random channel output. Y n equals X" + X"^ corrupted by erasures in the 
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Figure 5: Binary additive multiple access channels with (a) erasures and (b) additive noise. 
In both cases, Z x , Z 2l . . . are i.i.d and independent of the channel inputs. 

erasure channel model; we denote the probability of an erasure as q(l). For the additive 
noise channel model, Y n equals X™ + XV; + Z n , where Z n is the i.i.d additive binary noise. 
Both examples use addition over the binary field; and noise is independent of the channel 
inputs. We begin by deriving the NSMA capacity regions of the multiple access channel with 
erasures and the multiple access channel with additive noise. The following lemma describes 
these regions: 

Lemma 4 The NSMA capacity region for both the additive multiple access channel with 
erasures and the additive multiple access channel with additive noise equal the rate region 
achieved by time-sharing between the points (C, 0) and (0,C), where C = 1 — q(l) for the 
erasure model and C = 1 — H(Z) for the additive noise model. 

Proof: See Appendix 2. □ 
Since time-sharing between two linear codes can itself be described as a linear code, the 
time-sharing solution demonstrates not only that the end points are achievable by linear 
codes but also that all points in the set of achievable rates are achievable by linear multiple 
access channel codes. Therefore, we have Theorems [7| and |H1 that random linear multiple 
access channel codes achieve the NSMA capacity for the binary multiple access channel with 
erasures and the binary multiple access channel with additive noise, respectively. 

Theorem 7 Consider a multiple access channel with input alphabets X\ = X 2 = W 2 and out- 
put alphabet y = {0, 1, E}. If the channel inputs at time i are and X 2 ^, then the channel 
output at time i is the binary sum X lti +X 2 i with probability q(0) and E with probability q(l). 
Erasures are i.i.d and independent of the channel inputs. The error probability averaged over 
the ensemble of rate-(R\, R 2 ) random linear multiple access channel codes tends to as the 
codeword lengths tend to oo, if R± + R 2 < 1 — q(l). 

Proof: See Appendix 2. □ 



Theorem 8 Consider a multiple access channel with input-independent, additive noise. 
Suppose that the input alphabets, output alphabet, and noise alphabet are all equal to the bi- 
nary field W 2- Let noise Z\, Z 2 , . . . be drawn i.i.d according to distribution q(z). If the channel 
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inputs at time i are X± t i and X 2l i, then the channel output at time i isYi = X\^ + X 2j i + Z^. 
The error probability averaged over the ensemble of rate-(R±, R 2 ) random linear multiple 
access channel codes tends to as the codeword lengths tend to 00, if R± + R2 < 1 — H(Z). 

Proof: See Appendix 2. □ 



4.3 Linear Joint Source- Channel Coding 

We start by showing that source- channel separation holds for binary sources and binary 
erasure or additive noise multiple access channels where the erasure or additive noise is 
independent of channel inputs. This result is embodied in the following theorem that uses 
Lemma n from section 2 in its proof: 

Theorem 9 For any pair of binary sources and any binary erasure or additive noise multiple 
access channel where the erasure or additive noise is independent of the channel inputs, 
separation holds. 4 " 

Proof: See Appendix 2. □ 
This leads to Theorems ITU1 and fTTT which apply to the binary multiple access channel with 
erasures and additive noise, respectively: 

Theorem 10 Consider a multiple access channel with input alphabets X\ = X 2 = IF 2 and 
output alphabet y = {0, 1, E}. If the channel inputs at time i are X\ t i and X 2<i , then the 
channel output at time i is the binary sum X\^ + X 2 ,i with probability q(0) and E with 
probability q(l); the erasure events are i.i.d and independent of the channel inputs. If 
source pair (Ux t i, t/2,1); (^1,2, ^2,2), ■ ■ ■ is drawn i.i.d according to distribution p{ui,u 2 ) with 
H(Ui, U 2 ) < 1 — q(l), then there exists a sequence of joint source- channel codes with proba- 
bility of error P^ — > 0. Conversely, if H(Ui,U 2 ) > 1 — ^(1), then the probability of error 
for any communication system is bounded away from zero. 

Proof: See Appendix 2. □ 



Theorem 11 Consider a multiple access channel with input-independent, additive noise. 
Suppose that the input alphabets, output alphabet, and noise alphabet are all equal to the bi- 
nary field W 2 . Let noise Z\, Z 2 , . . . be drawn i.i.d according to distribution q(z). If source pair 
f/2,1), (^1,2, £^2,2), ■ ■ ■ is drawn i.i.d according to distribution p(ui, u 2 ) with H(U\, U 2 ) < 
1 — H(Z), then there exists a sequence of joint source- channel codes with probability of error 
Pe^ — > 0. Conversely, if H(Ui,U 2 ) > 1 — H(Z), then the probability of error is bounded 
away from zero. 

4 In order to maximize the mutual information between the inputs and output of a multiple access channel 
with input-independent noise, we need to maximize the entropy of the channel output. Binary addition 
(XOR) of two independent binary random variables corresponds to circular convolution of their probability 
mass functions (pmfs). If one of the pmfs is uniform, the binary sum has a uniform distribution which leads 
to its entropy being maximized. Thus, if the channel inputs are uniform, they maximize the entropy of the 
channel output for an additive multiple access channel operating over the binary field. It is this property of 
circular convolution that gives rise to source-channel separation in multiple access networks operating over 
finite fields. 
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Proof: See Appendix 2. □ 
Since source-channel separation holds for the multiple access channel with input-independent 
erasures and additive noise, we can combine the optimal linear source and channel codes into 
a single joint source-channel code to achieve optimal performance. 

We now show that instead of concatenating the linear source and channel codes, we may use 
a random linear code that maps source sequences directly to channel inputs in a manner that 
allows optimal performance for the binary multiple access channel with input-independent 
erasures or additive noise. The two linear joint source- channel encoders do not cooperate 
with each other. 

Theorems IT21 and 1 1 31 show that random linear joint source-channel codes achieve performance 
equivalent to that given in Theorems El and El respectively, and are thus optimal for the 
binary multiple access channel with input-independent erasures or additive noise. 

Theorem 12 Consider the random source (C/i,i, ^2,1), (^1,2, ^2,2), • • • drawn i.i.d according 
to distribution p(u\, U2), and let Z%, Z 2l . . . be the channel's random erasures, where Z\, Z2, ■ ■ ■ 
are drawn i.i.d according to distribution q{z), all Z{ are independent of the source, and Z% = 1 
denotes an erasure in channel use i. Assume that the source and channel input alphabets are 
equal to the binary field F 2 . If H(U%, U 2 ) < l — q(l), then the error probability averaged over 
the ensemble of random linear joint source- channel codes tends to as the codeword lengths 
tend to 00. 

Proof: See Appendix 2. □ 



Theorem 13 Consider the random source (U^x, t/2,1), (^1,2, ^2,2), • • • drawn i.i.d according 
to distribution p(u\,u 2 ), and let Zi,Z 2 , ... be the channel's random additive noise, where 
Zi, Z 2 , . . . are drawn i.i.d according to distribution q{z), and Zi are independent of the source. 
Assume that the source, channel input, channel output, and noise alphabets are all equal to 
the binary field F 2 . If H(U\, U2) < 1 — H(Z), then the error probability averaged over the 
ensemble of random linear joint source- channel codes tends to as the codeword lengths tend 
to 00. 

Proof: See Appendix 2. □ 



4.4 Multiple access networks with input-dependent additive noise 

We have seen that for channels where the NSMA capacity region is equal to the CMA 
capacity region, separation holds for all source pairs. The binary multiple access erasure and 
additive noise channels with input-independent noise are examples. However, when the two 
regions are not the same, separation may not hold for all source pairs. 
We analyze the binary additive noise multiple access channel and show that the CMA and 
NSMA capacity regions may not be the same when the noise is allowed to depend on the 
channel inputs. Define the CMA and NSMA sum capacities, R™ m A and Rf u f^ A , respectively, 
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as 



R C sT = wp max I(X 1 ,X 2 ;Y), 

VPx 1 X 2 (XI, X2) 

RTJf A = max I(X U X 2 ;Y). 

V{P X1 (X1),PX 2 (X2)) 



We compute the maximum loss in sum capacity, (R^^ A — R^J^ A ), over the ensemble of 
all binary additive noise channels. We also obtain the expression for the probability that 
the two sum capacities are unequal for a channel chosen randomly from the ensemble. Our 
results are embodied in the following theorems. 

Theorem 14 For noisy multiple access binary additive noise channels, the maximum dif- 
ference between the CMA and NSMA sum capacities is \ bit per channel use. 

Proof: See Appendix 3. □ 



Theorem 15 The probability that the CMA and NSMA sum capacities are unequal for a 
channel chosen randomly from the ensemble of equally likely channels is |. 

Proof: See Appendix 3. □ 



4.5 Systematic channel code constructions for multiple access net- 
works 

While the sum rate of a multiple access channel code measures the average number of bits per 
channel use successfully communicated by all transmitters to the channel's single receiver, it 
fails to address the question of what fraction of time each transmitter remains silent (in order 
to avoid causing interference with other transmitters). We next investigate this question for 
the noise-free binary multiple access channel. 

Define the u code rate" as the ratio of the number of information bits recovered at the receiver, 
to the total number of bits sent by the transmitter, in one time slot. Thus, the code rate is 
a dimensionless quantity with maximal value 1 that represents the overhead (in the form of 
redundancy) required for reliable communication. (Code rate is maximized when overhead 
is minimized.) We assume that when transmitters do not transmit, their channel input is 0. 
The code rate is 1 when there is no multiple access interference and no noise. The noise free 
multiple access network then becomes equivalent to a point-to-point channel without noise. 
We give a systematic linear code construction and prove that it achieves maximal code rate 
and capacity over all codes. We then prove that all codes that achieve the maximal code rate 
also achieve capacity. We also look at the bursty case when transmitters transmit according 
to a Bernoulli random process and propose coding techniques to maximize code rate. For 
such bursty channels, we show that when the information codewords at the input to the 
channel encoders have the same size, maximal expected code rate is achieved by adding 
redundancy at the transmitter with a higher probability of transmission and not adding any 
redundancy at the transmitter with a lower probability of transmission. 
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Figure 6: Single Slot Model for the Noise-Free Multiple Access Channel. 




Figure 7: Pictorial representation of communication scheme. 



4.5.1 Single Slot Model 

Since separation holds in binary multiple access additive noise networks when noise is in- 
dependent of the inputs, it also holds for noise-free binary multiple access channels (V = 
X a + Xj, with X a , Xf,, Y e F 2 ) where interference from other users limits capacity. 
We consider a discrete, time-slotted channel where transmissions start at the beginning of 
the slot and occur over the length of a slot. The transmitters do not coordinate their trans- 
missions and hence, time sharing is not possible. Figure |H1 shows a single slot model of the 
noise-free multiple access channel. Information bits coming out of the source coders in one 
slot duration are represented as a and b, respectively. Vectors a and b have sizes n a and n b 
respectively, and have i.i.d and uniformly distributed entries. The elements of a and b are in 
F 2 . For this model, all operations, matrices and vectors are in the binary field. We refer to 
a and b as transmit vectors and assume without loss of generality that n a > n^. Let m a and 
m& be the redundant bits added to vectors a and b, respectively, by the systematic channel 
code. We use l a and £& to denote the lengths of the vectors obtained by channel coding on 
a and b respectively, giving l a = n a + m a and lb = + m&. In general, l a ^ lb, so both 
transmitters may not transmit for the entire slot duration. However, at least one transmitter 
transmits for the whole slot duration. Therefore, the slot length is given by 



S = max(/ a , lb) bits. 



(7) 
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Let L a be an (n a + n&) x n a matrix and Lb be an (n a + n&) x n& matrix. These are the 
generator matrices for the channel codes at a and b, respectively. Only the first l a rows of 
L a and the first lb rows of Lb are non-zero. X a and Xb are the codewords that are sent over 
the channel, and they interfere additively over the binary field. At the decoder, a matrix T 
of dimension (v a + Vb) x (n a + n&) decodes the received vector to generate a subset of a and 
6. Let Q be the decoded output containing v a (v a G {1, . . . , n a }) bits of a denoted by a and 
Vb (vb G {1, . . . , n b }) bits of 6 denoted by b' . We have the following relations: 

X a = L a a X b = L b b Y = L a a + L b b, 

Q = TY = (TL a )d+(TL b )b. 

Figure E| describes the communication scheme pictorially. 



4.5.2 Maximal Code Rate 

In this section, we derive the maximal code rate of the noise-free binary multiple access 
channel. For the given interference channel, the NSMA capacity region is the set of all rate 
pairs, (R a ,Rb), satisfying 

Ra < 1, (8) 

R b < l, (9) 
Rl s m MA = R a + R b <i, (io) 

where the rates are in bits per channel use. 

We now find the maximal code rate for this channel. Transmitters a and b transmit n a and 
rib information bits per slot respectively and the codewords transmitted have a length of l a 
and lb bits respectively. The transmission rates are 

R - Ha 

tjNSMA _ n a + n b /, , \ 

^sum S ' 

We assume transmission at a rate within the capacity region, giving 

n a + n b < S, (12) 

by (jlOl ITTj) . The code rate is a dimensionless quantity given by 

„ n a + rib 

ta + tfe 



n a + n h 



min(/ a , l b ) + max(Z a , l b ) 

rig + rib 
min(/ a , l h ) + S 



(14) 
(15) 



< — . 16 

n b + n a + n b 
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Equation (fT5|l is due to (J7J). Expression (fTBj) follows from (fT2~j) and the fact that n& < n a 
implies n& < min(/ a ,/b). Thus, we have 

Crate < ~^ ■ (17) 

n a + 2n b 



4.5.3 Code Construction 

We now construct systematic codes that achieve the capacity and maximal code rate for the 
binary noise free multiple access channel; we call these codes optimal codes. The code rate, 
transmission rates R a , Rb, and sum rate R^Jf for our model are given by 

r _ Va + Vb R _ Va 

^rate i ; i 

l a + Lb o 

n ^& TiNSMA V a + v b 

^b q ^sum q 

Note that there may be many optimal codes. In Appendix 4, subsection A. 4.1, we describe 
one such construction and prove its optimality. We also establish Theorem Hoi which shows 
that maximal code rate achieving codes are capacity achieving. 

Theorem 16 For a noise-free binary multiple access channel, codes achieve the maximal 
code rate if and only if they are capacity achieving and no redundancy is added to the smaller 
transmit vector. 

Proof: See Appendix 4, subsection A. 4. 2. □ 



4.5.4 The case when transmitters are bursty 

In our discussion of systematic code constructions, we assume that each transmitter has a 
codeword to transmit in each slot. We now consider the case when the channel encoders 
may not always have an input information codeword to encode. 

We assume that each transmitter transmits in a slot according to a Bernoulli process. The 
lower the probability of transmission, the burstier the transmitter. We would like to know 
what coding techniques to use in order to obtain the maximal code rate over a large number of 
transmissions. Intuitively, it can be expected that bursty transmissions will reduce multiple 
access interference and increase the code rate. Moreover, we should be able to obtain a 
code rate of 1 (the code rate of a point-to-point noise-free channel) in the limit that one 
transmitter stops transmitting. In this section, we illustrate coding techniques for bursty 
multiple access and also show that the limits that we expect actually hold. The result is 
embodied in the following theorem: 

Theorem 17 When the information codewords at the input to the channel encoders have the 
same size, maximal expected code rate is achieved over the noise-free binary multiple access 
channel by adding redundancy at the less bursty transmitter and not adding any redundancy 
at the more bursty transmitter. 



Proof: See Appendix 4, subsection A. 4. 2. 



□ 
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Figure 8: A broadcast system source code with three receivers. 

5 Broadcast Networks 

The next simple model under consideration is the broadcast system, where one transmitter 
sends information to a collection of receivers. We consider the binary erasure channel and 
show that linear joint source- channel codes are optimal. 

5.1 Linear Source Codes 

A broadcast system source code comprises a single encoder and a collection of decoders. Since 
the case with two receivers has special structure absent from general broadcast system source 
codes [3*2*1 l""3~] , we focus on the three- receiver system of Figure |H1 The results given simplify 
easily to the two- receiver case and generalize to more receivers. Note that, since we consider 
discrete channels, the degraded broadcast channel converses of |2Z] or of |H|, which allow 
no common information or partial common information, are applicable. In the given broad- 
cast system source coding model, samples of source vector (U\,U 2 ,U 3 ,Ui 2 ,U 23 ,Ui 3 ,Ui 23 ) 
are drawn i.i.d from some distribution p(ui, u 2 , u 3 , ux 2 , u 23 , Ui 3 , U\ 23 ). The source descrip- 
tion contains components of rates Ri, R 2 , R 3 , Ri 2 , R 23 , R i3 , and i?i23- Decoder 1 receives 
the rate R\, R± 2 , R i3 , and R\ 23 descriptions and uses them to decode (U\, U% 2 , Ui 3 , Ui 23 ). 
Decoder 2 receives the rate R 2 , R± 2 , R 23 , and -R123 descriptions and uses them to decode 
(U 2 , Ui 2 , U 23 , U\ 23 ). Decoder 3 receives the rate R 3 , R± 3 , R 23 , and i?i 2 3 descriptions and uses 
them to decode (U 3 , Ui 3 , U 23 , Ui 23 ). While several receivers decode the common information, 
each has a different subset of the descriptions with which to decode. 
The following theorem proves the optimality of linear broadcast system source codes. 

Theorem 18 Consider samples of source vector (Ui,U 2 ,U 3 ,Ui 2 ,U 23 ,Ui 3 ,Ui 23 ) drawn i.i.d 
according to distribution p(ui,u 2 ,u 3 ,ui 2 ,u 23 ,ui 3 ,ui 23 ) on (F 2 ) 7 and linear broadcast system 
source codes of rate-(Ri, R 2 , R 3 , R\ 2) R 23 , R± 3 , R\ 23 ) and codeword length n. For any s C 
{1,2,3,12,23,13,123}, let u s = {u a ) a&s , and let {nR) s = J2 a &s \ n ^a\ ■ Then for any rates 
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Y, 



(b) 

Figure 9: (a) The erasure broadcast channel and (b) a physically degraded channel with the 
same capacity (a = (92(1) — gi(l))/(l — ?i(l)) and all erasures propagate as erasures). 



satisfying 



(nR) s >H(U s \U Sl -s) 
{nR) s > H(U s \Us 2 - s ) 
(nR) s > H(U s \Us 3 - s ) 



V s C Si = {1, 12, 13, 123}, s ^ 

V s C S 2 = {2,12,23, 123}, 

V s C S 3 = {3, 13, 23, 123}, 



the error probability averaged over the ensemble of linear broadcast system source codes tends 
to as codeword length tends to 00. 



Proof: See Appendix 5. 



□ 



5.2 Linear Channel Codes for the Erasure Broadcast Channel 

We next consider the erasure broadcast channel models shown in Figure^ (a) and (b). A 
single channel input is sent to receivers 1 and 2. In the first model, the output at receiver 1 
is an erasure with probability 91 (1) and the transmitted value with probability <7i(0); like- 
wise, the output at receiver 2 is an erasure with probability 92(1) an d is otherwise received 
correctly Without loss of generality, assume that qi(l) < 92(1)- I n this model, erasures are 
assumed to be independent events. In the model of Figure Of b), the erasure probabilities for 
the two receivers are the same, but the erasures are dependent random variables, with all 
erasures at the first receiver propagating to the second receiver. By PI Theorem 14.6.1], the 
capacity of the broadcast channel depends only on the conditional marginal distributions 
p(yi\x) and p(y 2 \x), thus the capacity of the two channels shown and all channels with the 
same p(yi\x) and p(y 2 |^) (regardless of the statistical dependencies between erasure events 
Z\ and Z<i) are identical. 5 Note that the elegant and simple converse for degraded BSC 

5 All channel models considered here assume Z\ and Zi are independent of the channel input. 
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broadcast channels of which relies on properties of binary sequences, might be readily 
extended to our model, albeit without the generality of [37| 138]. 

5.2.1 Capacity of Erasure Broadcast Channel 

Lemma 03 proves time-sharing to be optimal for broadcast coding over the given family of 
channels. The result of Theorem [TUl proving the rates achievable by linear broadcast channel 
codes on the erasure broadcast channel is then immediate by the previous linearity of time- 
sharing argument. The given bound is optimal for the case of no common information. No 
converse exists for the case of common information, but the given linear coding achievability 
results agree with the best known achievability results on the binary erasure channel. 

Lemma 5 Consider a binary erasure channel with output alphabets {0, 1,E} at each of two 
receivers. The erasure sequences Z ltl , Z 1)2 , ■ ■ ■ and Z 2) i, Z 2;2 , . . . are drawn i.i.d according 
to distributions qi{z\) and q 2 (z 2 ), respectively, where = 1 denotes an erasure event at 
receiver i at time j. The joint distribution q(zi,z 2 ) may be any distribution with the given 
marginals, but the channel noise is independent of the channel input by assumption. The 
capacity region for sending independent information to the two receivers is described by 

Rl + R " <1. 
l-ft(l) 1-&(1)- 

For any achievable independent information rate pair (Ri, R 2 ), the rate triple (R[, R' 2 , R' 12 ) — 
(Ri, R 2 — Rq, Ro) with common information rate R' l2 and independent information rates R[ 
and R' 2 is also achievable for any Rq < R 2 . 

Proof: See Appendix 5. □ 

The following theorem shows that random linear channel codes are optimal for the erasure 
broadcast channel. 

Theorem 19 Consider an erasure channel with input alphabet W 2 and output alphabets 
{0,1, E} at each of two receivers. The erasure sequences Zi\, Zi t2 , . . . and Z 2t i, Z 2>2 , . . . 
are drawn i.i.d according to distributions qi(zi) and q 2 (z 2 ), respectively, where Z t j = 1 
denotes an erasure event at receiver i at time j . The joint distribution q(zi,z 2 ) may be any 
distribution with the given marginals, but the channel noise is independent of the channel 
input by assumption. Let {B n }^ =1 describe a sequence of channel codes. Each B n is an 
n x + \nR 2 \) matrix with elements chosen i.i.d Bernoulli(l/2) . If R\/(l — <7i(l)) + 

i?2/(l — 92(1)) < 1; then the expected error probability E[P e (B n )] — > as n — > 00. 

5.3 Linear Joint Source- Channel Coding for the Erasure Broad- 
cast Networks 

We have seen in the previous subsections that linear source and channel codes are optimal 
for the erasure broadcast channel. Moreover, Lemma El shows that time-sharing is optimal 
for broadcast coding which establishes that source- channel separation holds for this class of 
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channels. Therefore, the linear source and channel codes can be combined to yield optimal 
linear joint source-channel codes. This shows the optimality of linear joint source-channel 
codes for the erasure broadcast channel. 

5.4 Additive Noise Broadcast Networks 

To date, there exist no results to prove the optimality of linear broadcast codes for the 
additive noise broadcast channel model. In this case, time-sharing is not the optimal so- 
lution pQ, and direct application of the techniques used in this paper fail to achieve the 
optimal performance. The stumbling block is that we cannot apply the construction used to 
build channel input X from the auxiliary random variable W to be decoded by the second 
receiver. (See the proof of Lemma El) In particular, we cannot achieve the appropriate (non- 
uniform) cross-over probability from the auxiliary random variable to X using an additive 
signal created by a linear code. In this case, as in Theorem EE the time-sharing solution is 
achievable with linear coding. While the time-sharing solution gives a bound on the perfor- 
mance achievable by linear coding, the time-sharing solution is sub-optimal for this problem. 
Linear coding performance beyond the time-sharing bound may or may not be possible. A 
possible strategy for trying to move linear codes beyond the time-sharing bound is described 
in Appendix 5. 

6 End-to-End Coding and Conclusions 
6.1 End-to-End Coding 

The preceding sections treat the topics of source and channel coding using the tools of linear 
network coding, bringing previously disparate areas into a common framework. We end by 
demonstrating that this unification is not only useful in its combination of tasks once treated 
entirely separately but is in fact crucial to achieving optimal, reliable communication. 

Traditional routing techniques rely entirely on repeat and forward strategies for getting 
a source from its point of origin to its desired destination. The network coding literature 
demonstrates the failure of that approach in achieving the optimal performance for some 
simple multi-cast examples [31]. We next demonstrate the failure of the network coding 
model. 

The common network coding model assumes that all sources are independent and all 
links are noiseless. Implicit in the given model is the assumption that source and channel 
coding are performed separately from network coding at the edges of the network, so that 
the internal nodes need only pass along the information to the appropriate receivers. We 
next demonstrate that source-network separation and channel-network separation both fail. 
That is, there exist networks for which network coding and source coding must be performed 
jointly in order to achieve the optimal performance. Likewise, there exist networks for which 
network coding and channel coding must be performed jointly in order to achieve the optimal 
performance. We use a sequence of simple examples to prove these results. 

Example 1: The network of Figure ITU1 comprises two transmitters and three receivers. 
Receiver node 1 wishes to receive U±, receiver node 2 wishes to receive Uz, and receiver node 
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H(Ui,U % )/2<y&-+(U u U 2 ) 



H(U 2 ) U * 



Figure 10: A network for which separation of source and network coding fails. 




u 2 



u 2 



Figure 11: A network for which separation of channel and network coding fails. 

3 wishes to receive both U\ and U 2 . Sources (Ui, U 2 ) are dependent random variables, with 
H(Ux,U 2 ) < H{Ui) + H(U 2 ). All network links are lossless, and the capacities are noted 
in the figure. Achieving reliable communication in this example requires the descriptions 
received by nodes 1 and 2 to be dependent random variables and requires sources U\ and U 2 
to be re-compressed at nodes 1 and 2, respectively. Thus separation of source coding and 
network coding fails. □ 

Example 2: Consider the network shown in Figure The channel between node 
and nodes 1 and 2 is a broadcast erasure channel with independent erasures of probabilities 
gi(l) = 52(1) = Q- The network between nodes 1 and 2 and node 3 is a multiple access 
channel without interference. The network coding approach requires labeling each link with 
its corresponding link capacity. If R\ and R 2 are the capacities of the edges to receivers 1 
and 2, then Ri + R 2 must be less than 1 — q by Theorem fEU The links from node 1 to node 3 
and from node 2 to node 3 are both lossless, with capacity 1 bit per channel use. Optimal 
network coding on the given channel gives a maximal rate of 1 — q from the encoder to the 
decoder. We contrast with the above separated channel and network coding approach an 
end-to-end coding strategy. In this case, we do not force zero error probability between node 
and nodes 1 and 2 but instead simply forward the information received by those nodes to 
the decoder. The capacity of the resulting code is 1 — q 2 since receiver 3 suffers an erasure 
only if both node 1 and node 2 receive erasures. □ 

In addition to illustrating the failure of separate channel and network coding schemes, 
Example 2 serves as a reminder that general network capacities cannot be proven by break- 
ing the network into canonical elements and solving them independently. Sadly, the strategy 
given for that example is not always optimal. In particular, the strategy discussed in Ex- 
ample 2 demonstrates that failure to decode at intermediate nodes of the network can yield 
performance superior to that achieved by decoding at intermediate nodes. Example 3 gives 
an example that teaches the opposite lesson. 
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Figure 12: A network for which separation of source and network coding fails. The links 
between nodes 1 and 2 and nodes 2 and 3 are independent erasure channels with probabilities 
of erasure <?i(l) and 92(1), respectively. 

Example 3: Consider the channel of Figure The links between nodes 1 and 2 and 
nodes 2 and 3 are independent erasure channels with probabilities of erasure q\ (1) and 92(1), 
respectively. If we do not decode at the intermediate node, then the maximal achievable 
rate from node 1 to node 3 is (1 — gi(l))(l — ^(l))- Decoding at node 2 yields maximal 
achievable rate min{l — gi(l), 1 — <?2(1)} > (1 _ <?i(l))(l — <?2(1))- 

The failure of separation in Examples 1 and 2 and the contrasting lessons regarding 
decoding at intermediate nodes demonstrated by Examples 2 and 3 make the case for the need 
for end-to-end coding in network environments. The success of the linear coding technique 
in network coding, source coding, and channel coding suggests that a unified approach that 
obviates the need for separate routing, compression, and error correction codes may be 
within reach. In contrast, the failure of separation across canonical network systems seems 
to present a far greater challenge to optimal code design in networks. 

6.2 Conclusions 

In this paper, we consider networks operating over a common finite field. We show that linear 
codes are optimal for the point-to-point, multiple access and erasure broadcast networks. We 
prove that for multiple access networks, source- channel separation holds as long as noise is 
independent of inputs. We show that separation may fail for binary multiple access networks 
with input- dependent additive noise. We present an optimal systematic multiple access 
channel code construction and also provide coding techniques when transmitters are bursty. 
We show with examples that design for individual network modules may yield poor results 
when such modules are concatenated, establishing the necessity for end-to-end coding. Thus, 
we show that it is the lack of decomposability into canonical network modules that is a much 
greater challenge than the lack of separation between source and channel coding. 

Appendix 1 

Proof of Lemma |3J 

The proof of this lemma uses the analogue of the Darmois-Skitovich theorem for discrete 
periodic Abelian groups by Fel'dman [45J. Let us proceed by contradiction. Suppose that 
the jth column of a has non-zero elements in positions i and % {% ^ i). Then V\ and Vi 
both experience a non-zero contribution from Uj. In this case, the independence of and 
Vi requires that pj be a uniform probability mass function, which gives a contradiction. □ 

Proof of Theorem [2J 

To accomplish linear channel coding for the erasure channel, we use an n x \nR\ linear 
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generator matrix B n and a conceptually simple non-linear decoder. The linear channel 
encoder is defined by 

7 (^ ni?J ) = i?nV. 

Let X n denote the channel input and Y n denote the corrupted channel output. For any 
y n = y t £ {o, 1, E} n define the decoder as 

{v n if (B n v)i = yi for all i s.t. yi G F 2 
and ,av/v s.t. (B n v)i = yi for all % s.t. y- t G F 2 
y\nR\ otherwise, 

where for any v G F2 , (B n v)i is the zth component of the vector B n v. Again, decoding 
to V"L nii J denotes a random decoder output. 

For the erasure channel, we can immediately decode Z n from the received string Y n . For 
any z n G FJ, define £{z n ) = {e G : ei = Zi Vi s.t. Zj = 0}. A decoding error occurs if 
there exists a v ^ V for which B n Y — B n v = B n (V — v) G £ (Z n ), since any such v would 
be mapped to the same channel output by Z n . For any z n with Y17=i Zi = ^> I^O 2 ™)! = 2 fc - 
Using the definition of the typical set, z n G implies that XT=i z % — n (<?(l) + e ')> where 
e' = e/log(g(l)/g(0)). Thus for any fixed z n G A< n) and w* G F^ HJ , Pr(B„w G £(z n )) < 
2-n2n(g(i)+e ) ( s i nce B n w is uniformly distributed by the argument in the proof of Theorem^), 
giving 

E\pir\B n )] 

= E[Pr (Error A Z n £ 4 n) (g))] + £[Pr (Error A Z n G ^( n) (g))] 

^ e «+ E E K^ MJ )^ n )l(v^v)Pr(S n (v-v) G£(z n )) 

- e "+ E E p(w Lni?J )g(^)2 Ln/?J 2- n 2 n ^ 1 ) +e ') 

< Cn 4. 2 -™( 1 -'?(i)-^)+L™«J 

for some e n —>■ 0. Here A^\p) is the typical set for the source distribution and Ae(q) is the 
typical set for the noise. The expected error probability decays to zero as n grows without 
bound provided that R < 1 — q(l) — e'. □ 

Proof of Theorem St 

The joint source-channel code's encoder is defined by 

C(u n ) = C n u. 

Denote the random channel input and output by X n and Y n , respectively. For any y n = 
y* G {0, 1, E} n the decoder is defined by 

{u n if (C n u)i = yi for all i s.t. y { G F 2 
and ^u/u s.t. (C n u)i = yi for all i s.t. y { G F 2 
U n otherwise. 
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Here, (C n u)j denotes the ith component of vector C n u. The error probability for code C n is 

P e (C n ) = Pr(7? n (CnCn) + U n ), 

where U n and Y n are the random source vector and channel output, respectively. Theorem 0] 
demonstrates that the expected error probability for a randomly chosen linear code C n decays 
to zero as n grows without bound. 

Again, we can immediately decode Z n from the received string Y n , and a decoding error 
occurs if there exists a u 7^ U for which C n (XJ — u) G S(Z n ). Thus 

E[p( n \C n )} = E[Pr (Error A (U n G" Af\p) V Z n Af\q)))\ 
+E[Vr (Error A U n G Af\p) A Z n G 4 n) (g))] 
< 2e n + E PW)l(^u)Pr(C n (u-u)G^")) 



< 2e n + p(u n )g(z")2 n ( H ( f/ ) +£ )2- n 2 n ^( 1 ) +e ') 

«"eA^ n) (p) z™£A< n) (<7) 

< 2e n + 2 -™( 1 -<'(i)-c'-tf 00-0 



for some e n — > 0. Thus the expected error probability decays to zero as n grows without 
bound provided that H(U) < 1 — q(l) — e — e'. □ 

Proof of Theorem [5j 

We now consider a linear joint source-channel code for the binary additive noise channel. 
Again, given n x n matrix C n , we define the encoder as 

CK) = c n u. 

Given random channel input C n XJ, the channel output is 

Y = C n XJ + Z 

The decoder is 

{u n if u n G A^\p) and Bz 11 G A[ n \q) s.t. C n u + z = y 
and fi(u n , z n ) G (At\p) PI {u} c ) x ^ n) (g) s.t. C„u + z = y 
?7 n otherwise. 

The error probability for code C n is 

P e (C n ) = Pr( Vn (UU n ) + ^ t/ n ). 

An error occurs if two source sequences are mapped to the same channel input vector or 
if there exist distinct noise vectors that map distinct channel input vectors to the same 
channel output. In the first case, C n U = C n ti for some u ^ U, and in the second case, 
C n XJ + Z = C n u + z for some u / U and z ^ Z. Restricting our attention to typical 
source and noise vectors, an error occurs if there exists a u G A{ (p) such that u^U and 
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C n (u — U) G {0} U {z — Z : z G At (q)}. For any fixed u-u^0 and randomly chosen C n , 
the coefficients of vector C n (u — u) are sums of fixed numbers of i.i.d Bernoulli(l/2) values. 
Thus Pr(C n (u - u) = w) = 2" n for all w G F™ and 

E[pW(C n )} 

= E[Pr (Error A (U n A<f{p) V Z n <£ Af\q)))} 
+E[¥r (Error A U n G A { J l \p) A Z n G A<t\q))] 

< 2e„+ Yl pK)^ n )l(u^u)Pr(C n (u-u)=z-z) 

(u»,z"),(u",i")eAl" ) (p)x4 n) ( 5 ) 

< 2e n + ^ p( M n )g(^)2 n(H(c/)+e) 2 n(H(<?)+e) 2- n 

(«", Z ™)G4 n) (p)x4 n) (g) 

< 2e n + 2 ~ n ( 1 - H M- H ( u )- 2 ^ 

for some e n — > 0. The error probability goes to zero provided that H(U) < 1 — H(q) — 2e.U 
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Appendix 2 



Proof of Lemma 0J 

For the multiple access channel with erasures, the NSMA capacity region, 



Y NSMA ) 



IS 



p erasure 
NSMA 



{R U R 2 ) :R 1 + R 2 < l-g(l) 



Similarly, the NSMA capacity region for the additive noise multiple access channel 



add — noise 
' ^NSMA J 



IS 



jadd — noise 
^NSMA 



(R 1 ,R 2 ):R 1 + R 2 <1-H(Z) 



Hence, for both channels, the NSMA capacity regions are triangles and time-sharing can 
achieve any point in the region. □ 

Proof of Theorems [7| and |Ht 

The following argument demonstrates the construction of linear multiple access channel codes 

from linear channel codes for single-transmitter, single-receiver networks: 

Matrix pair (5 n l , 5 n 2 ) denotes a linear multiple access channel code with encoders 



7i («} 



72 [V$ 



[nR 2 ] - 



Bo n V 2 . 



We build matrices (-B n ,i, B nt2 ) from the linear code for the corresponding single-transmitter, 
single-receiver channel. Let {-B n }^ =1 be a sequence of rate-i? single-transmitter, single- 
receiver channel codes for the given channel model, then matrix pair (£>° 1; B® 2 ) = (B n , n R Xn ) 
describes a multiple access channel code that achieves rate pair (R, 0). Similarly, matrix pair 
{B\ x , B\ 2 ) = (O n Rxn, b n ) describes a multiple access channel code achieving rate pair (0, R). 



The multiple access channel code achieving the (A, 1 
(R, 0) and (0, R) is a linear code with 



B\ n 0\nx(l-\)nR 
0(l-A)nxAn_R 0(i_A)nx (l-X)nR 



A) time-sharing solution between 



OxnxXnR Oxnx(l-X)nR 
0(l-X)nxXnR -B(l-A)n 



We decode the first Xn channel outputs with the decoder for B\ n and the remaining outputs 
with the decoder for /3n_A) n - The resulting codes lead immediately to Theorems [7| and |H1 
While the proofs of Theorems [7| and |H1 take slightly different approaches, this difference is not 
essential. The proof methodology from Theorem which uses direct typical set decoding 
rather than building a parity-check matrix, can be adapted to the additive noise multiple 
access channel. □ 



Proof of Theorem |UJ 

Let us first consider the binary multiple access channel shown in Figure a) with erasures 
that are independent of the inputs. As seen in Lemma the NSMA capacity is 



p erasure 
NSMA 



(R 1 ,R 2 ):R 1 + R 2 <l-q(l)\. (18) 
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The three mutual information terms, I(X\] Y\X 2 ), I(X 2 ; Y\X%) and I(Xi,X 2 ; Y), are max- 
imized by uniform distribution on X\ and X2. For the same channel, the CMA capacity 
is 

^gmT = {(#1,^2) : Ri + R2 < 1 - (19) 

where, the three mutual information terms, I{X\\ Y\X 2 ), I(X 2 ; Y\Xi) and I(Xi,X 2 ; Y), are 
maximized by making P{X\ = i,X 2 = j) = \ for i,j e {0,1}. Combining (|TH| H5j) . we 
obtain 



oerasure Tiperasure 

^NSMA _ -^CMA • 



Hence, by Lemma ^ separation holds. We have thus proved the theorem for the binary 
multiple access channel with erasures that are independent of the inputs. 
Let us now consider the binary multiple access channel shown in Figure Efb) with noise being 
independent of the inputs. As seen in Lemma EJ the NSMA capacity is 



jadd — noise 
^NSMA 



(R 1 ,R 2 ):R 1 + R 2 <1-H(Z)\. (20) 



The three mutual information terms, I(Xi,Y\X 2 ), I(X 2 ;Y\Xi) and I(Xi, X 2 ;Y), are max- 
imized by uniform distribution on X± and X 2 . For the same channel, the CMA capacity 
is 



jadd — noise 
^CMA 



(R 1 ,R 2 ):R 1 + R 2 <1-H(Z)\, (21) 



where, the three mutual information terms, I{X\\ Y\X 2 ), I(X 2 ; Y\Xi) and I(Xi,X 2 ; Y), are 
maximized by making P(Xi = i,X 2 = j) = | for i, j G {0,1}. Combining (J2U1 l2*T|) . we 
obtain 



oadd — noise Trpadd — noise 

^NSMA — CMA 



Hence, by Lemma ^ separation holds. We have thus proved the theorem for the binary 
multiple access channel with additive noise that is independent of the inputs. □ 



Proof of Theorem HOt 

Theorem |H1 specifies the Slepian-Wolf region for the given source as R\ > H(Ui\U 2 ), R 2 > 
H(U 2 \Ui), and R\ + R 2 > H(U\, U 2 ). We see from Theorem |H1 that source-channel separation 
holds and the CMA capacity region is the same as the NSMA capacity region. Since (from 
LemmaEJ) the NSMA capacity region for the given channel is R\ + R 2 < 1 — g(l), the theorem 
follows. □ 



Proof of Theorem lilt 

The proof follows in the same manner as the proof for the multiple access channel with era- 
sures. Here, by Theorem |Hl the Slepian-Wolf region for the given source is R\ > H(Ui\U 2 ), 
R 2 > H(U 2 \Ui), and R\ + R 2 > H(Ui,U 2 ). We see from Theorem El that source-channel 



MIT LIDS Technical Report 2687, March 2006 



37 



separation holds and the CMA capacity region is the same as the NSMA capacity region. 
Since (from Lemma 0]) the capacity region for the given channel is Ri + R 2 < 1 — H(Z), the 
theorem follows. □ 

Proof of Theorem 1121 : 

Again, we begin by noting the erasure positions in Y n and using them to reconstruct Z n . A 
decoding error occurs if there exists a Ui 7^ Ui for which Ci )n (Ui — ui) G S(Z n ), a u 2 7^ U 2 
for which C 2i „(U 2 — u 2 ) G £(Z n ), or a Ux 7^ Ui and u 2 7^ U 2 for which Cx )n (Ui — Ui) + 
C 2:n (V 2 -u 2 )e£(Z n ). Thus 

E[pW(C lin ,C 2tn )] 
= E[Pr (Error A (([/?, U%) G" A[ n \p) V Z n £ A^\q)))} 
+E[Pr (Error A (Uf, U%) G Af\p) A Z n G Af\q))] 

< 2e n + E P(«i>«a)9CO 

K,u£)eA< n) (p) z«eA^" 3 ( g ) 

£ Pr(Ci,n(ui — Ui) G 8(z n )) 

u1^u™:(u 7 -{ ,u%)£A ( e n) (p) 

+ J] Pr(C 2 , n (u 2 - u 2 ) eS(z n )) 

«5^5:(«»,u5)e4 n) (p) 

+ J] Pr(Ci, n (ui - ui) + C 2 ,„(u 2 - u 2 ) G £(z n )) 

^uj jtu% . (fin t fin )(zA{ n) (p) 

< 2e n + Yl E P«^)q(z n ) 

(«™,-u")g4 n) (p) z n eAi n) (q) 
2n(H(Ui\U 2 )+e)2~n2n(q(l)+e') 2 n W 72 l C/l )+ e )2~ n 2 n ^ 1 ) +e ') + 2 n( - H( - UuU '^ + ^2~ n 2 n ^ q( - 1)+t '^ 

< 26 + 2~ n{1 ~ q{1) ~ e '- H (Ui\U 2 )-e) + 2- rf ( 1 -5(l)-e'- J H'(C/2|f/i)-e) _|_ 2-n(l-9(l)-e'-J3"(C/l,C/2)-e) 

for some e n — > 0. Thus the expected error probability decays to zero as n grows without 
bound provided that max{F(C/i|C/ 2 ), F(£/" 2 |£/i), H(U h U 2 )} = H(U 1} U 2 ) < l-q(l)-e-e'. 
□ 

Proof of Theorem 1131 

An error occurs if two values of v,™ are mapped to the same value of x", two values of 
are mapped to the same value of x 2 , or if there exist distinct noise vectors that map distinct 
source vectors to the same channel output. In the first case, Ci jn Ui = Ci jn Ui for some 
ui 7^ Ui; in the second case, C 2jn U 2 = C 2jn u 2 for some u 2 7^ U 2 ; and in the third case, 
Ci,nUi + Z = Ci jn ui + z for some ux 7^ Ui and z 7^ Z, C 2>n U 2 + Z = C 2>n u 2 + z for some 
u 2 7^ U 2 and z 7^ Z, or Ci in Ui + C 2jn U 2 + Z = Ci in Ui+C 2)ri U2+z for some u 2 7^ U 2 , u 2 7^ U 2 , 
and z^Z. Thus, setting ^(z") = {z — z : z 7^ z, z* G Ae(q)} and restricting our attention 
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to typical error sequences, we sum up the error events as: Ci jn (Ui — Ui) G {0} U ^(Z 71 ), 
C 2 ,n(U 2 -u 2 ) g {0}UF(Z n ), andC 1 , n (Ui-u 1 ) + C 2 , n (U 2 -U2) G F{Z n ). We then bound 
the expected error probability as 

E[pW(C ltn ,C 2>n )) 
= E[Pr (Error A ((Uf, £/") £ A^\p) V Z n # A?\q)))] 
+E[Pr (Error A (*7f, U%) G A^\p) A Z n G A^\q))\ 

< 26„+ ^ ^ P(«?, 

«,«£)e4 n) (p) z Il eA<" ) ( g ) 



Pr(Ci,„(u, - AO e {0} U JF( 2 ")) 



+ 



Pr(C 2 , n (u 2 - u 2 ) G {0}U.F(z")) 



fljj4u5:(«y,flj)eA< n) (p) 



+ Pr(Ci,n(ui - ui) + C 2 , n (u 2 - u 2 ) G F{z n )) 

m™ 7^™ ,-u™ ^u™ : (u™ ,u™ ) G A^ n) (p) 

< 2e n + ^ p«,<)g(z n ) [ 2 n ^^ 1 l f/2 ) +£ )2- n 2 n ^( z ) +e ) 

K,u5)eA< n) ( P ) z n eA ( e n) ( q ) 
+ 2 n ( H ( u 2\Ui)+e)2~ n 2 n ( H ( z )+ € ) + 2 n ( H ( c/l ' c/ 2)+f)2- n 2 n(//(z:)+e) ] 

< 2e n + 2 _n ( 1_H ( z )~- f/ ( f/l l c/2 )" 2£ ) + 2" n ( 1 " H ( z )-- f/ ( c/2 l c/l )- 2e ) + 2~ n{1 ~ H ^- H{JJl ^~ 2 ^ 

for some e n — > 0. Thus the expected error probability decays to zero as n grows without 
bound provided that ra^{H{U x \U 2 ),H{U 2 \U x ),H{Ux,U 2 )} = H(U X ,U 2 ) < 1 - H{Z) - 2e. 
□ 



Appendix 3 

Proof of Theorems 1141 and 1151 

Let us consider a noisy multiple access channel where two transmitters transmit binary {0,1} 
symbols, X x and X 2 , to a single receiver. The received symbol Y is also binary {0,1}. Binary 
additive noise Z is allowed to depend on the input symbols being transmitted and has the 
distribution: q^ = Pr{Z = l\X x — i,X 2 — j) for i,j G 0, 1. Define Pjj = Pr(X x — i,X 2 — j) 
for i,j G 0, 1, pi = Pr(Xi = 0) and p 2 = Pr(X 2 = 0). Let us define the function TL{.) as 

7-t(q) = -glog 2 (g) - (1 - q) log 2 (l - q) for q > 

and aoo = 1 — goo, a oi — ?oi, a w = Qio, «n = 1 — 9n- Since are probabilities, G [0, 1] 
for i G {0, 1}. Note that characterizes a particular multiple access channel. 
We compute i?^ MA (a o, «oi, "io, an) and iS^(a 00 , cuoi, «io, «n) as 



^Lm^^OO) «01, «10, "11 ' 



„ ^P 1 ^ n RcMA( a oo> a oi, aio, aii) -foo, Poi, -PiO) -Pn)) (22) 
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where 

RcMA( a OO, Oin, Poo, Pq\, P\q, P\i) = 7^[Poo«oo + -Poi^oi + -Po^io + -Pll^ll] 

-P 00 H(a 00 ) - PoiW(aoi) " P W n(a 10 ) - P n H(a n ), (23) 

and 

^l MA («oo,«oi,«io,«n) =raaxR NSMA (a 00 ,a i,a w ,a 11 ,p 1 ,p 2 ), (24) 

Pi ,P2 

where 

= H\pxP2<y.QQ +pi(l -p2)oiQi +P2(1 -Pi)aw + (1 -£2)0111] 

-pip 2 'H(aoo) - - P2)W(aoi) - p 2 {l - Pi)T-C(a w ) - (1 - - p 2 )H(an)(25) 



The difference between the CMA and NSMA multiple access sum capacities, G(a 00 , a i, aio, «n), 
is 

G(a 00 , a 01 , a w , a n ) = R^ A (a 00 , a 01 , a 10 , a u ) - Pf^ /j4 (aocb «oi, «io, «n)- 



A. 3.1 An example where the CMA and NSMA capacity regions differ 

Consider a channel parameterized by a-oo = 0, «oi = 0.5, «io = 0.5 and «u = 1. This 
choice of conditional noise probabilities makes the noise input-dependent. We compute from 

P™ A (0,0.5,0.5,1) = 1, 
Pfif A (0,0.5,0.5,l) = i, 

G(0, 0.5, 0.5,1) = -. 

Now, we have a channel where the cooperative and separate sum capacities are unequal. 
This example shows that when noise is allowed to depend on the channel inputs, the NSMA 
and CMA capacity regions may not be the same. Under this scenario, separation may fail 
for some source-channel pairs. 



A. 3. 2 Maximum difference between the CMA and NSMA sum capacities 

We now find the maximum difference between the CMA and NSMA sum capacities for binary 
multiple access channels with additive noise. For this, we need to evaluate 

max G(ctoo, «oi, «io, «n)- 

aoo ,aoi ,«io ,an G [0, 1] 

Henceforth, we refer to "sum capacity" as "capacity" for brevity. 
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Characteristics of cooperative capacity achieving joint input distribution 

Let us establish the characteristics of the joint input distribution that achieves the cooper- 
ative capacity. If (pi,p 2 ) achieves the separate capacity and (Poo, Ph, Pio, Pn) achieves the 
cooperative capacity of a channel, then the separate and cooperative capacities are the same, 
PsuJ 4A ( a oo, «oi, "lo, an) = P^ A (a 00 , a i, a w , a n ), for all a 00 , at i, a w , a n G [0,1] if and 
only if 

P1P2 = Poo, (26) 
Pl (l-p 2 ) = P 01 , (27) 
p 2 (l-pi) = P 10 , (28) 
(l- Pl )(l-p 2 ) = P U , (29) 

by $MZB- For (12323) to hold, we need 

PnPoo = PoiPo- (30) 

Thus, for any channel, whenever the joint input distribution achieving the CMA capacity 
obeys (|30J). the cooperative and separate capacities are the same. 

Maximizing the cooperative mutual information 

We next prove Lemma [7| that identifies the joint input distribution that achieves the CMA 
capacity for an arbitrary binary multiple access channel. The following definitions and lemma 
are useful for that proof. Define a m i n = min{aoo, «oi, «io, "ii}, «max = maxjaoo, «oi, «io, «n} 
and ai,«2 G {«oo, «oi, «io, — {a m m, «max}- Therefore, a m - m and a max are the smallest 
and largest a^, respectively, where i, j G {0, 1} and a min < ai,a 2 < a ma x- 

Lemma 6 There exists p G [0, 1], such that 

PcMA( a min, «1, «2, "max,??, 0, 0, 1 - p) > R CMA { 

where (p' min , p[, p' 2 i Pmax) specifies a joint input probability distribution. 
Proof: Choose p such that 

P«min + (1 - P)a max = p' min a min + p x OL\ + p' 2 a 2 + Pmax^max- (31) 

This is possible, since p G [0, 1]. Now, using (J2SI), the lemma holds if 

pTL(a min ) + (1 -p)H(a mSLX ) < p min H(a min ) +p' 1 TL(a 1 ) + p' 2 H(a 2 ) + P m ^(a max ) . (32) 



Solving (JSE 

<p[ 
+P 2 



7i(ai) - 7i(a max ) H — _ max {7i(a m m) - 7i(a max )} 

H(a 2 ) - 7Y(« max ) + 012 _ QW {7Y(a min ) - 7^(a max )} 

^max ^min 



(33) 
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is a necessary and sufficient condition for the lemma to hold. Since Ti{.) is a concave function, 



< H(ax) - H(a max ) + 
< H(a 2 ) - ft(a max ) + 



^max ^min 
Qi2 - «max 



^(ttmin) - W(a ir 
^(«min) - H(a i 



since ai,a 2 G [a m in, «max]- Moreover, p 1; p 2 > which implies that ()33|) holds. The proof is 
now complete. □ 
We now prove Lemma [7| Let us define a function lnd(.) that extracts the indices of its 
argument. For example Ind(oty-) = (i, j). 

Lemma 7 R'cMA( a mmi a 2, amax> -Poo, -foi, -Pio, Pu) is maximized by the joint probability 
distribution (p m m, 0, 0, 1 — Pmin) where 

Pmin P%j 

for the indices such that a m i n = o^-. 

Proof: For a joint distribution (p m - m . Pi, p^ Pmax) i we have, from 



R< ^ul A ( a mm, "1, "2, "max) = , max RcMA^raim <*!, "2, "max, P min , p'uPjj, Pmax ' ■ 



1 111 

Pmin 'Pi 1P2 iPmax 



and a min < «i, a 2 < a max . Using Lemma |H1 we have 

^m A («min, «1, «2, "max) = HiaX -R^A/A^min, «i, « 2 , « max , P, 0, 0, 1 - p). (34) 

pe[o,i] 

Let (J3"4^ be maximized at p = g*. Note that g* multiplies a m i n and (1 — q*) multiplies 
a max . We define p m j n = g*. Thus, the cooperative mutual information is maximized by the 
probability distribution (p m - m , 0, 0, 1 — p m i n ). The proof is now complete. □ 
Since p min multiplies a min and 1 — p min multiplies ct max , the following corollary follows: 

Corollary 2 (a 0) a oi, ^lo, «n) depends only on a m i n and a max . 

For a channel where = ./ for z, j, z', j' G {0, 1}, there is more than one input prob- 
ability distribution that achieves the cooperative capacity. Equation (|3U|) must hold for at 
least one of these distributions for the cooperative and separate capacities to be the same. 
Hence, while optimizing, when we have more that one choice, we choose p m i n such that (f3T)|) 
holds. If (j3*Uj) does not hold for any of the choices, then the CMA capacity is strictly greater 
than the NSMA capacity. 



Probability that the cooperative and separate capacities are unequal 

We saw in Corollary El that R'^^^qcqq, ttoi? ctio: ctii) depends only on o: m j n and tt max . This 
implies that only two of the cooperative capacity achieving joint input probabilities, P^, are 
non-zero. Thus, (|3U|) does not hold when either Poo — P\\ — or Pqi — Pio — 0- The former 
condition occurs when (a m i n , a max ) G {(ckqi, aio), (aio, «oi)} an d the latter occurs when 
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(a m ; n , a max ) G {(aoo, «n), («n, «oo)}- Combining the two, we see that the CMA capacity is 
strictly larger than the NSMA capacity whenever 

(«min, "max) G {(«00, "ll), "Oo), ("01, "lo), («10, «0l)}- 

Therefore, the separate and cooperative capacities are unequal if and only if any one of the 
following events occur 

£i : «oo < "oi, «io < an £2 ■ "11 < «oi, «io < «oo, 

£3 : «oi < «oo, «n < «io &t : «io < «oo, «n < «oi- 

Since, the four events are disjoint, the probability that capacities are unequal for a channel 
picked randomly from the ensemble of all such channels, P U nequai? is 



-^unequal ^ ^ (£j 



i=l 



If a 00 , a i, «io an d «n are independent and identically distributed in [0,1], 



Pr(£ t ) = ^ for ie {1,2,3,4}. 



Thus, 



p _ 1 

-^unequal g' 



We have thus proved Theorem ITB1 



Maximum difference between the cooperative and separate capacities 

We have seen that the cooperative and separate capacities are not the same when 

(aw, "max) £ {("oo, an), (an, a o), (a i, aio), (aio, a i)}. 

We classify channels for which this happens into two types. Type 1 channels satisfy (a min , a max ; 
{(aoo,an), (an,Q!oo)}- Type 2 channels satisfy (a min , a max ) G {(a i,a w ), (a w ,a 01 )}. Con- 
sider a type 2 channel, C 2 , parameterized by (a^ , a^ , , a n ). Now, consider another 
channel, C*, whose parameters are (o^*, ®oi , a^* , a n *) such that 

c* c 2 

a 00 = a 01 , 



2 



C* C' 

aoi = a o , 



„c* c : 



a ro = a ii > 

c* c 2 



Thus, (a^- n , a^* x ) G {(a^*, af^), (af x *, a^*)} and this new channel is of type 1. Therefore, 
we define C* = C 1 . (We use the superscript to designate the channel type.) Note that, 
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Since the cooperative capacity depends only on the highest and lowest ocij, 

K sum l"00 > «01 > a 10 > a ll J = K sum l a 00 , «01 > «10 > a ll )■ 

Moreover, if a particular separate capacity is achieved for C 2 by a probability distribution 
(pi,P2), the same separate capacity can be achieved for C 1 by a probability distribution 
(Pi, 1-^2)- Thus, 

T>NSMA( C 1 ^C 1 ^C 1 ^C 1 ^ r>NSMA(„C 2 „C 2 „C 2 „C 2 \ (ocA 
K sum l a 00 > «01 > a 10 > "ll J - K sum [ a 00 > a 01 > a 10 > a ll J • l db J 

From (j35l36|) . we have 

/^*/ C 1 C 1 C 1 (7I \ / (7 2 £<2 q2 (J 2 \ 

G(a 00 , a 01 , a 10 , a u J = G(a 00 , a 01 , a 10 ,%). 

Therefore, for every type 2 channel, there exists a type 1 channel with the same loss in 
capacity Hence, while finding the maximum loss, we can confine our analysis to type 1 
channels. 

Consider two type 1 channels, C\ and C\, parameterized by («oo, «oi, «io, «n) and («oo, 0.5, 0.5, an), 
respectively. As both C\ and C\ have the same (a m m,a max ), we have from Corollary 

^ A («oo, «oi, aio, <*n) = ^ A («oo, 0.5, 0.5, a u ). (37) 

Since, the noise entropy for Cf cannot exceed the noise entropy for C\, i.e., 

Hc\(Z\X x = 0,X 2 = 1) < Woi^lXi = 0,X 2 = 1) = 1, 
Hc^ZlX, = l,X 2 = 0) < W c i(^|Xi = 1,X 2 = 0) = 1, 

we have 

^l MA («oo,aoi,aio,a n ) > tffif A (a o, 0.5, 0.5, a n ). (38) 
Combining (|57jl and (pIHj) gives 

G(a o, «oi, aio, an) < G(a 00 , 0.5, 0.5, a n ). (39) 

Thus, in finding the largest loss, it is sufficient to focus on channels parameterized by 
(a, 0.5, 0.5, /3) for a,/3e [0,1]. We define 

iff A (a^ = < 5 m MA («,0.5,0.5,/3), 
i2j(a,/3) = i?j(«, 0.5, 0.5,/?), 
G(a,/3) = C(a,0.5,0.5,/3). 



Therefore, the cooperative capacity given by ()22J23j) becomes 

P) = max ft[P 00 a + (1 - P m )(3] - P o^(«) - (1 - ^o)W(^). (40) 

-Poo £[0,1] 

The maximum of (J4*U|) occurs at 

P * = _£ 1 

00 (3 -a (/3-a)(l + exp(0))' 
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where 

, n(j3) - nipt) 



(3 — a 

Substituting, we obtain 

r c :i a {^(3) (4i) 

= n (i — l —r-^]+(t>f3-n{f3) 



l + exp(0)y 1 + exp(0) 

The sum rate achievable by separate source-channel coding given by (|24I25|) is 

■E^sum ip) P) 

maxH[p lft (a + (3 - 1) + fa +p 2 )(0.5 - 0) + (3} 

P1.P2 

-pmMa) + H(J3) - 2] - fa + p 2 ) [1 - H(J3)] - H((3). 

Computing R^J^ A (a, (3) is difficult in general since the maximizing probability distribution 
is hard to evaluate. Hence, let us define R^ SMA (a, P) as the sum rate due to separate 
source-channel coding where fa,p2) are chosen to minimize the Euclidian distance between 
points (Pq Q , 1 — Pq Q ) and fap2, (1 — ~~ Pz))- We further impose the constraint that 
Pi — Pi = P- By using a probability distribution that minimizes Euclidian distance, we have 
a lower separate capacity and an upper bound on the difference between the cooperative 
and separate capacity. Later, we show with an example that the bound is achieved. Thus, 
minimizing squared Euclidian distance, 

p* = argmin[(P * - p 2 ) 2 + ((1 - P* ) - (1 - p) 2 ) 2 ], 
p 

and 

R-NSMA^i P) 

= H[p* 2 a +p*(l- p*) + (1 - p*) 2 (3] - p* 2 H(a) - 2p*(l - p*) - (1 - p*) 2 H(f3). (42) 
Define Gd{ol,P) as 

G D (a,(3) = R c ;i A {a^) - R D NSMA {^P). (43) 

The probability distribution that minimizes Euclidian distance and assumes p\ = p 2 cannot 
yield a mutual information higher than the separate capacity. Therefore, 

R% SMA (a,P)<R?™ A (a,P), 

and 

G(a,P)<G D (a,P). (44) 

Combining (|39I44|) we find that for every aj o, «Qi) a ia, a u £ [0,1], there exists a, (3 G [0,1] 
such that 



G(a o,a 01 ,a w ,a n ) < G D (a,/3). 



(45) 
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Minimizing Squared Euclidian Distance 

Let 



Setting 



we obtain 



d 2 = (Poo-P 2 ) 2 + ((l-Poo)-a-p) 2 ) 
dd 2 



dp 



0. 



2p 3 - 3p 2 + 2p- P* = 0. 



We have three roots for this polynomial. Two are complex and should be omitted. The real 
root is given by 

1 1 



V — - 
y 2 



2 2 / 3 {/108P * - 54 + v/108 + (108P * - 54) J 



108P * - 54 + ^108 + (108P * - 54) 2 
+ - ttq . (46) 

2 1/3Q V ) 

From (j4U42l43l46|) . we have the explicit expression for Gu(a,(3). We need to find the 
maximum value of Go (a, (3) over the unit square a, /3 £ [0, 1]. We have the following lemma: 



Lemma 8 



max G D (a,/3) = G D (0,l) = G D (l,0) = -. 

«,/3e[0,i] 2 



Proof: For a, (3 £ (0, 1), we have 

dG D (a } f3) 

da 
dG D {a,(3) 



7^0, 
7^0. 



8(3 

Therefore, no critical points of Gd(cx,/3) lie inside the square region and the maximum 
values of (3) must lie on the sides of the square. We obtain the following properties of 

Gd{o!,P) over the four sides of the square: 

a = 0,,e[0,ll: ^) >0 

/J=1,« 6 |P,1]: ^^^<0 
„= MeP ,,l]: ^f^<0 

S = 0,ae[0,l] ; ^^£^) >0 
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Therefore, Go{ce,f3) takes its maximum values at 

(a = 0,0 = 1), 
(a = 1,(3 = 0). 

Using (|41I42I43I46|) . we evaluate G\d(., .) at these points as 

G D (0,1)=G D (1,0) = ^. 

This completes the proof. □ 
Using Lemma |H1 and ()45|). we have for a o, c*oi, aicb a n £ [0, 1], 

G(a 00 ,a i,a w ,a u ) <^. (47) 

Tightness of the bound 

We now show that the bound on G(a 00 , a i, a w , an) is achieved. For the channel considered 
in section A. 3.1, we have 

G(0, 0.5, 0.5, 1) = ^. 

Now, we have a channel whose CMA capacity is greater than the NSMA capacity by the 
bound in (|37j) . The bound in (jUj) is thus achieved. 

Therefore, for noisy multiple access binary additive noise channels, the maximum difference 
between the CMA and NSMA sum capacities is \ bit per channel use. This completes the 
proof of Theorem El 1=1 



Appendix 4 

A. 4.1 Code construction 

By definition, l a > n a and If, > n&. There exist simple codes with l a = lb = n a + rib for which 
v a = n a and Vj, = rib, thus all elements can be recovered after multiple access interference 
at the receiver. The region ABCD shown in Figure EI] illustrates these constraints. All 
achievable points outside ABCD have a lower code rate since the number of received bits 
remains the same while l a and lb increase. Hence, we confine our analysis to the region 
ABCD. 

In this region n a < l a < n a + rib and rib < h < n a + rib, which makes < m a < rib, 
< rrib < n a , < v a < n a and < v b < rib- 

Let W a = {TL a ) and Wb = (TLb). We define a Irow as a row vector having only one 
non-zero bit. Since R contains v a bits of a and Vb bits of b, W a should be a (v a + Vb) x n a 
size matrix with v a lrows, Wj, should be a (v a + Vb) x nb size matrix with Vb lrows and the 
lrow positions for these matrices should not overlap. Let W = [W a \Wb] and L = [L a \Lb\. 
Then, W should have v a + Vb unique lrows. Looking at the relations obtained from our 
model, we see that W is generated by receiver matrix T operating on L, and the rows of 
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H n a +n b 

| Indicates Regions 



Figure 13: Region of Analysis. 



W are linear combinations of the rows of L. By definition, W consists only of lrows. For 
a given L, we need to find the maximal number of lrows in W that can be generated by 
linear combinations of the rows of L. This maximizes v a + v & for a given l a + If, which in turn 
maximizes the code rate and also specifies L a , L& and T. Therefore, codes that achieve the 
maximal code rate and sum rate are found by jointly optimizing the encoder and decoder 
matrices. In our further discussion, Ikxk represents a k x k identity matrix and piXp2 a 
Pi x p 2 null matrix. In this subsection (A. 4.1), we will prove Lemmas flDlll2j) . Note that 
the scope of these lemmas are limited to the particular systematic code construction we are 
considering in this subsection. We start by proving the following lemma: 

Lemma 9 Let Bi and B 2 be diagonal square matrices of size rif, with all diagonal elements 
non-zero. If s E [0,n a ] unique lrows are added into matrix J = \ BiO nbX ^ na _ nb ^B 2 ] to obtain 
matrix L such that the number of independent row vectors in L is s + and k e [0, n a — n&] 
lrows have their non-zero element in [rif, + 1, n a ], then the maximal number of lrows possible 
by any linear combination of the rows of L is 2s — k. 

Proof: The inserted lrows that are non-zero in positions [n& + l,n a ] cannot give rise to any 
other lrow in L since all the rows of J are in that position. Any other inserted lrow can 
be combined with a row vector in J to give a unique lrow in L since all the row vectors of 
L are independent. Thus, the s — k distinct lrows whose non-zero elements are not in the 
interval [nf, + 1, n a ], can generate 2(s — k) lrows in L. The total number of lrows that can 
be generated by the s inserted lrows is 2 (s — k) + k = 2s — k. □ 

Note that the proof depends only on the constraint that B\ and B 2 are diagonal with 
non-zero diagonal elements. Hence, we set B\ = B 2 = I nb xn b - Lemmas I1UI and ITTI exclude 
certain regions of rectangle ABCD in Figure El since optimal codes do not exist over them. 
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Lemma 10 Optimal codes are not contained in the region rib < h < n a . 



Proof: Let P 



L n b xn b 



We form matrix L by inserting distinct lrows into 



v(n a -n b )xn b 

P. Adding m a bits of redundancy at transmitter a generates m a lrows in L. These lrows 
have their Is in the first n a positions. Thus, if further increase in the number of lrows in L 
is desired by adding redundant bits at transmitter b, then the size of the redundancy added 
at transmitter b must satisfy 

m b > m a + n a - n b 
> n a - n b . 

Thus nib = (no redundancy added at transmitter b) or nib > n a — nb, which completes the 
proof. □ 

Lemma 11 Codes that achieve the maximal code rate and capacity are not contained on the 
line l a = lb- 
Proof: Let P be defined as before. Coding on this line results in the insertion of at least one 
row vector to P that is not a lrow. The inserted rows that are not lrows contain one 1 in the 
first n a positions and one 1 in the last n& positions. The other elements are 0. The number 
of lrows determines the size of the subset recoverable at the receiver. In this case, the rows 
that are not lrows increase redundancy but do not increase the number of information bits 
recoverable at the receiver. Thus, we should not insert any row that is not a lrow. This is 
not possible on the line m a = mb — [n a — nb}. Hence, this line does not contain optimal codes 
and the statement of theorem follows. □ 



Structure of Generator Matrices 

We now develop the structure of the generator matrices for the encoders at the two trans- 
mitters. We use systematic codes and show that they are optimal in terms of achieving 
maximal code rate and capacity. Using the results of lemmas El and El we now reduce the 
rectangular region ABCD in Figure El into three sub-regions shown in Figure El Let us 
first establish two cases: 
Case 1: nib = 0. 

In this case, redundancy is added at transmitter a only. Let the redundancy added to 
a be m a . This corresponds to appending m a lrow vectors to P such that the 1 in each 
of these vectors lies in the first n a positions and the resulting matrix consists of inde- 
pendent rows. Using Lemma the maximal number of lrows that can be generated is 
2(m a + n a — n b) — ( n a — n b) = 2m a + n a — n b . Now, [n a — n b ] + m a of the lrows generated will 
have their 1 in the first n a positions and m a lrows will have their 1 in the last nb position. 
Thus we have: 



v a 



Vb 



L n a xn a 



M maXria 

Q(n b -m a )x.n a 



1 n b xn b 
0n a xn 6 
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where, m& = 0, < m a < n a and M is a matrix containing lrows. 

Case 2 : n a — < nib < n a . 

Let nib = n a — nb + k, where < k < n&. 

When m a < fc, m a lrows are appended to P such that each lrow contains a 1 in the first n a 
positions. Then, k — m a lrows are appended to the matrix resulting from the previous step 
so that a 1 is contained in one of the last n& positions. The lrows are appended such that 
all rows of the resulting matrix are independent. Using Lemma |H1 we see that the maximal 
number of lrows that can be generated is given by 2m^ — [n a — n&]. There are m& lrows with 
1 in the first n a positions and m& — [n a — n^] lrows with 1 in the last rib positions. Thus we 
have: 



v a = m b 



v b = m b - [n a - n b ], 



A 



m a xn a 



1 n b xn b 
} (n a ~n b +m a )xn b 
^(m b -m a -[n a -n b ])xn b 



0, 







(n a -m b )xn b 



where, [n a — n b ] < m b < n a and < m a < m b — [n a — n b ] . A and A are matrices containing 
unique lrows. 

When m a > k, coding involves appending k lrows to P such that each row contains the 1 
in the last n& positions. Then, m a — k lrows are appended to the matrix resulting from the 
previous step so that a 1 is contained in the first n a positions for each vector, lrows are 
appended such that the resulting matrix consists of independent rows. Using Lemma El we 
see that the maximum number of lrows that can be generated is given by 2m a + [n a — rib]. 
There are m a lrows with 1 in the last rib positions and m a + [n a — n&] lrows with 1 in the 
first n a positions. Thus, we have: 



In a xn a 
®(m b -[n a —n b })xn a 
S(m a -m b +[n a -n b ])xn a 







(n b -m a )xn a 



m a + [n a - n b ] 







l n b xn b 



(n a -n b )xn b 
K(m b -[n a -n b ])xn b 
Q(n b -m b +n a -n b )xn b 



where, [n a 
lrows. 



nb] < rrib < n a and < m a < nib ~ [ n a — n b] ■ S and K are matrices containing 



Regions 

The regions over which optimal codes exist can be described and are shown in Figure El 
We denote the code rate in regions 1, 2 and 3 as C ra te-R 1 ,Crate-R 2 an d C rate -R 31 respectively. 
Region 1: nb < l a < n a + nj and lb = nb, 

, r i n 2m a + \ n a ~ n b] 

v a = m a + [n a - n h \ v b = m a C rate - Rl = ■ ■ . 

n a + n b + m a 

Region 2: l b < l a <n a + n b and n a < l b < n a + n b , 

2m a + [n a - n b ] 



v a =m a + [n a - n b ] v h = m a C, 



rate—Ri 



n a + n b + m a + m b 
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A 


_ine excluded 
by Lemma 1 1 

B 


J 

Region 3 — 
E 


/ -« Region 2 


D 


\ ! 

\ Uine excluded 
by Lemma 10 

Region 1 i C 




i 1 

I 
i 

1 1 > 



Indicates Regions 



Figure 14: Gross (un-optimized) regions over which optimal codes exist. 



Region 3: n a < l a < lb and n a < l b < n a + rib, 

2m b - [n a - n b ] 



v a = m b v b = m b - [n a - n b ] C r , 



rate—Rz 



n a + n b + m a + m b 



Optimized Regions 

Lemma 12 To achieve the maximum code rate, it suffices to add redundancy to only one 
vector. 

Proof : We see from Figure HU that in Region 1 and Region 2, v a and Vb do not depend upon 
m b . Thus, for higher code rate, m& should be kept as low as possible. We thus set m h = 
for Region 1 and m& = n a — nb + 1 for Region 2. As n a > nb, 

C r ate-Ri ^ C ra te-R 2 - 

Thus optimal codes cannot be in Region 2, as this region does not contain codes with higher 
code rate than Region 1. Hence, we do not consider this region in our further search for 
optimal codes. In Region 3, v a and v b do not depend on m a . Therefore, it is best to keep 
m a at its lowest, i.e. m a = 0. We thus consider codes over Region 1 and Region 3, where 
we set nib = and m a = 0, respectively. Thus, in order to achieve the optimal code rate, it 
suffices to add redundancy at only one transmitter. □ 
The optimized regions are shown in Figure ED The code rate in regions A and B are denoted 
as C rate _ RA and C rate _ RB , respectively. 
Region A : n a < l a < n a + nb and lb = nb 

, r i ^ 2m a + i n a - n b ] 

v a = m a + [n a - n b \ v b = m a C rate -R A - 



n a + n b + m a 
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A 




• 

Region B — ► 
E 


Region A C 


D 




1 1 > 



Indicates Regions 



Figure 15: Optimized regions. 



Region B: l a = n a and n a + 1 < l b < n a + n b 
v a = m b Vb = m b - [n a - n b ] 



a 



rate—Rs 



2m b - [n a - n b ] 
n a + n b + m b 



Achieving the capacity region and maximal code rate 

Lemma fT21 shows that in order to achieve the maximal code rate, it suffices to add redundancy 
at only one transmitter. Let the redundancy be m. In Region A, m a = m, < m < n b and 



C r ate— Ra 



2m + [n a - n b ] 
n a + n b + m 

In Region B, m b = m, {n a — n b + 1) < m < n b and 

2m - [n a - n b ] 



a 



rate—Rs 



n a + n b + m 



Case 1 : n a > n b . When < m < n a — n b , Region B is excluded and Region A provides 
the only solution. For all other m, C rat e-R A > C rate -R B - Thus, Region A always provides a 
higher code rate than Region B. From the code rate equations derived earlier, we see that the 
maximal code rate is obtained when m is largest, i.e. m = n b . Therefore, (m a , m b ) = (n b , 0) 
is the optimal point. This corresponds to (l a ,lb) = (n a + n b ,n b ). Thus, for obtaining the 
maximal code rate, we add redundancy to only the larger transmit vector and the size of the 
redundancy is the size of the smaller transmit vector. The code rate is 



a 



rate 



n a + n b 
n a + 2n b ' 



(4* 
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Case 2: n a = rib = n. Here, for a given m, both regions give the same code rate and we 
can add redundancy to either of the two vectors. A symmetry exists about the line l a = lb 
and there are two optimal points. Code rate is maximal when m is maximal, i.e. m = n. 
These points are (m a ,mb) G {(0, n), (n, 0)} corresponding to (l a ,h) G {(2n, n), (n, 2n)}. In 
this case, coding results in the size of redundancy being equal to the transmit vector size 
and the code rate is 2/3. 
The transmission rates of the code are 

Ra = -5-, (49) 

n a + n b 

R b = (50) 
n a + ri h 

R ^um MA = R a + Rb= 1- (51) 

We see from (14815 lj) that this code achieves the maximum code rate and capacity for this 
channel and is thus an optimal code. Moreover, this code obeys the property that no redun- 
dancy is added to the smaller transmit vector. Theorem [TBI proves that any maximal code 
rate achieving code satisfies this property. 



A.4.2 

Proof of Theorem 1161 

We first prove the forward part. Let a code be capacity approaching without redundancy 
being added to the smaller transmit vector. From the code construction described in Section 
14.51 we see that such codes exist. We thus have the following relations: 

n a + n b , 
mm(l a ,l b ), 
n a + n h 
min(/ a ,/ fe ) +5" 
rig + n b 
n a + 2n b ' 

Therefore, capacity approaching codes with no redundancy added to the smaller transmit 
vector achieve the maximal code rate by (jl7jl . This completes the forward part of the proof. 
We prove the reverse part now. A code that achieves the maximal code rate must meet the 
inequality in (jlfi)) with equality. From the code construction described in Section 14. 5[ we 
know that codes that achieve the maximal code rate exist. Therefore, maximal code rate 
achieving codes must satisfy 

S = n a + n b , 
min(Z a , l b ) = n b . 

Hence, maximal code rate achieving codes achieve capacity and do not add redundancy to 
the smaller transmit vector. The reverse part of the proof is now complete and we have 



S 
rib 

Crate 



a 



rate 
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proved the theorem. □ 
Proof of Theorem 1171 

Let the probability that transmitters a and b have a codeword to transmit in a slot be p a 
and pb respectively. Transmissions begin at the start of a slot and the transmitters do not 
coordinate. We again define the sizes of a and b as n a and Consider the expected code 
rate. The mean sizes of a and b are p a n a and pbiib respectively. We will therefore define the 
mean sizes n 1 , n 2 where n 2 < n 1 as 

n[ = max(p a n a ,p b n b ) n 2 = min(p a n a , p b n b ) . 

The maximal expected code rate is thus 



a 



rate 



n l + n 2 
n[ + 2n 2 



where (from Theorem 116(1 no redundancy is added to the transmit vector with smaller mean 
size. Therefore, two cases arise. 

Case 1: We add redundancy only at a and not at b if the mean size of a is greater than the 
mean size of b. This implies that 

n x = p a n a n 2 = p b n b , 

PaJla + PbU b 



a 



rate 



p a n a + 2p b n b ' 



Case 2: We add redundancy only at b and not at a if the mean size of a is smaller than the 
mean size of b, which implies that 

n'i = p b n b n 2 = p a n a , 

P a n a + Pbn b 



a 



rate 



2p a n a + Pb^b ' 



When n\ = n 2 , we can use either technique. Figure ITol shows the regions of the two dimen- 
sional space of (p a ,Pb) where the cases apply. Region 1 corresponds to the first case and 
Region 2 to the second. 

Let us denote a = — and 3 = — . We have a G [0, oo) and 3 > 1. Thus, the maximal 

Pb r n b L J > r — i 

expected code rate expression can be written as 

1 + a3 



Crate 



1 + a/3 + min(l, a.3) 



For a G [0,^], the mean size of a is less than or equal to the size of b and we add redundancy 
only at b. For a G oo) the mean size of a is larger than or equal to the mean size of b, 
and we add redundancy only at a. Note that for a = 4, we may add redundancy at a or b 
and still obtain the same expected code rate. The expected code rate is minimal and has a 
value of 2/3 when p a n a = pbftb- Figure IT7I shows how the expected code rate changes with 
a. We now look at the limit when transmitter a stops transmitting, i.e. a — > and when 



MIT LIDS Technical Report 2687, March 2006 





a 

Figure 17: Variation of code rate with a. 



MIT LIDS Technical Report 2687, March 2006 



55 



transmitter b stops transmitting, i.e. a — > oo. Evaluating the value of expected code rate 
as a tends to or oo, we get 

lim C rate (a,(3) = 1 lim C rate (a, (3) = 1. 

a— +oo 

These limits are what we had expected since, in both cases, one transmitter transmits in a 
slot with probability 1 and the other does not transmit at all. There is no multiple access 
interference and the average code rate becomes the code rate of a point-to-point noiseless 
channel, i.e. 1. 

When (3 = 1, i.e. n a = rib, we see that if pb < p a , we add redundancy only at a and when 
Pb > Pa we add redundancy only at b. The statement of the theorem follows. □ 



Appendix 5 



Proof of Theorem 1181 

In this case, the linear encoder is a matrix of dimension 

([ni^x] + \nR 2 ] + \nR 3 ] + \nR 12 \ + \nR 23 ] + \nR 13 ] + \nR l23 }) x n. 

The first [ni?i] bits of the output go to receiver 1 only. The subsequent [rai^] an d [^-^3] 
bits similarly go to receivers 2 and 3, respectively. Next come, in order, the rate-i?i2, -R23; 
i?i 3 , and -R123 descriptions. We again use typical set decoding. 

Given the linear structure of the code, we can break encoder matrix A n into a collection of 
\nR a ~\ x n sub-matrices, a G {1, 2, 3, 12, 23, 13, 123}, such that 

A x>n 

A 2 ,n 
A 3 , n 

An. = Al 2 ^ n 
^23,n 

Al 23i n 

We begin by bounding the expected probability of decoding in error at receiver 1, here 
denoted as E[P e (Ai >n , Ai 2;U , Ai 3)n , Ai 23tU )]. The arguments for receivers 2 and 3 are similar. 
By the union bound, the code error probability is bounded by the sum of the individual 
decoder error probabilities. 

An error occurs at receiver 1 if any subset of the desired sources is decoded in error. Thus, 
following our standard approach, 



E[P e (Ai in , Ai 2i n, A 13iTl , Ai 23)n )] < e T 



(u™ ,u™ 2 ,u™ 3 ,u™ 23 )eA{ ' 

E E 



< 



Pr(A, jn (u s - u s 

2n(H(U s \U Sl - a )+2e)2-(nR) s 



sCSv.s^<f> 
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for some e r 



0. 



□ 



Proof of Lemma H3 

By PU Theorem 14. 6.1, Theorem 14.6.2], the capacity of the given channel is the convex 
hull of the closure of all (Ri,R 2 ) satisfying R 2 < I(W; Y 2 ) and Ri < I(X;Yx\W) for some 
joint distribution p(w)p(x\w)p(yi\x)p(y2\yi) ■ Here W is an auxiliary random variable with 
alphabet size 2 and p(y2\yi) is derived from the physically degraded channel model. By a 
symmetry argument, the optimal W is a uniform binary random variable with p(x\w) = l — j3 
if x — w and p(x\w) = (3 otherwise. Thus 

R 1 < I(X;Y 1 \W) 

= IiX-Y^-IiW-^) 

= (1 - ft (l)) - [H((l - gi (l))/2, qi (l), (1 - g x (l))/2) 

-^((l-/3)(l- gi (l)) jgi (l)^(l- ?1 (l)))] 
= (1 - qi (l))H((3) 
R 2 < I(W;Y 2 ) 

= H{(1 - gi (l))(l - a)/2, gi (l) + (1 - gi (l))a, (1 - gi (l))(l - a)/2) 

-F((l - /3)(1 - ?1 (1))(1 - a), qi {\) + (1 - gi(l))a,/3(l - gi (l))(l - a)) 
= (l- gi (l))(l-a)(l-iy(/3)) 
= (l- g2 (l))(l 

Varying from to 1 gives the independent coding result. The common information 

result comes from P3 Theoreml4.6.4]. □ 



Strategy for moving linear codes linear codes beyond the time-sharing bound: 

Consider a systematic code with a low density parity-check matrix. Let the encoding matrix 
be 

/ 

B n = Pn P 2 i 
P 22 

where / is the (|_^-RiJ + L^-^J) x (L n -^iJ + L n -^2j) identity matrix and Pn, P 2 i, and P 22 
have dimensions 



nRi- 



H(Z X ) 



x nRi, nRi- 



H(Z X ) 



x nR 2 , 



n 



nR 2 — nRi 



1 - H(Z 1 



x nR 2 , 



respectively. (We here drop the rounding notation for readability but note that all of the 
above quantities must be integers.) For each i G {1,2}, let Z- = [Z^Z^Z^Z-J, where the 
sub- vectors have lengths nR\, nR 2 , nR\H{Zi) / (1 — H(Zi)), and n — nR 2 — nR± / (1 — H (Zi)) , 
respectively. Applying the above code, the channel output at receiver 2 is 

V! + Z 21 
V 2 + Z 22 
PuVi + P21 V 2 + Z 23 
P 22 V 2 + 



MIT LIDS Technical Report 2687, March 2006 



57 



If the decoder at that receiver applies parity check matrix P n to the the received Vi + Z 2 i 
and subtracts off the outcome from the third component of Y then the modified signal is 

V x + Z 21 
V 2 + Z 22 
-P21V2 + Z23 + P11Z21 

P22V2 + 

Decoder 2 thereby recovers more of its parity check symbols at the expense of increasing the 
corresponding error probability in those symbols. When the density of parity check matrix 
Pn is low, the increase in error probability for symbols -P21V2 may also be low enough to 
make those parity check bits useful in decoding the description of V 2 . Receiver 1 uses the 
same technique to decode V 2 , then subtracts off its impact on the parity check bits for Vi, 
and finally decodes Vi. 
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